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Abstract 

We  consider  the  following  two  problems.  We  are  given  as  input  a  set  of  activities  and  a  set  of 
jobs  to  complete.  Our  goal  is  to  devise  a  schedule  for  allocating  time  to  the  various  activities  so 
as  to  achieve  one  of  two  objectives:  minimizing  the  average  time  required  to  complete  each  job, 
or  maximizing  the  number  of  jobs  completed  within  a  fixed  time  T.  Formally,  a  schedule  is  a  se¬ 
quence  ((ui,n),  (v2,  t2),  . . .),  where  each  pair  (v,  r)  represents  investing  time  r  in  activity  v.  We 
assume  that  the  fraction  of  jobs  completed,  /,  is  a  monotone  submodular  function  of  the  sequence 
of  pairs  that  appear  in  a  schedule. 

In  the  offline  setting  in  which  we  have  oracle  access  to  /,  these  two  objectives  give  us,  respec¬ 
tively,  what  we  call  the  Min  Sum  Submodular  Cover  problem  (which  is  a  generalization  of 
the  Min  Sum  Set  Cover  problem  and  the  related  Pipelined  Set  Cover  problem)  and  what 
we  call  Budgeted  Maximum  Submodular  Coverage  (which  generalizes  the  problem  of 
maximizing  a  monotone,  submodular  function  subject  to  a  knapsack  constraint). 

We  consider  these  problems  in  the  online  setting,  in  which  the  jobs  arrive  one  at  a  time  and 
we  must  finish  each  job  (via  some  schedule)  before  moving  on  to  the  next.  We  give  an  efficient 
online  algorithm  for  this  problem  whose  worst-case  asymptotic  performance  is  simultaneously  op¬ 
timal  for  both  objectives  (unless  P  =  NP),  in  the  sense  that  its  performance  ratio  (with  respect 
to  the  optimal  static  schedule)  converges  to  the  best  approximation  ratios  for  the  corresponding 
offline  problems.  Finally,  we  evaluate  this  algorithm  experimentally  by  using  it  to  learn,  online,  a 
schedule  for  allocating  CPU  time  to  the  solvers  entered  in  the  2007  SAT  solver  competition. 


1  Introduction 


This  paper  presents  algorithms  for  solving  a  specific  class  of  online  resource  allocation  problems. 
Our  online  algorithms  can  be  applied  in  environments  where  abstract  jobs  arrive  one  at  a  time, 
and  one  can  complete  the  jobs  by  investing  time  in  a  number  of  abstract  activities.  Provided  that 
the  jobs  and  activities  satisfy  certain  technical  conditions,  our  online  algorithm  is  guaranteed  to 
perform  almost  as  well  as  any  fixed  schedule  for  investing  time  in  the  various  activities,  according 
to  two  natural  measures  of  performance.  As  we  discuss  further  in  §1.5[  our  problem  formulation 
captures  a  number  of  previously-studied  problems,  including  selection  of  algorithm  portfolios  lfT2l 
ESI,  selection  of  restart  schedules  Ifl4l[23ll,  and  database  query  optimization  [[5/25]. 

1.1  Formal  setup 

The  problem  considered  in  this  paper  can  be  defined  as  follows.  We  are  given  as  input  a  finite 
set  V  of  activities.  A  pair  (■ v ,  r)  G  V  x  M>0  is  called  an  action,  and  represents  spending  time 
r  performing  activity  v.  A  schedule  is  a  sequence  of  actions.  We  use  S  to  denote  the  set  of  all 
schedules.  A  job  is  a  function  /  :  S  — >  [0, 1],  where  for  any  schedule  S  G  S,  f(S )  represents  the 
proportion  of  some  task  that  is  accomplished  by  performing  the  sequence  of  actions  S.  We  require 
that  a  job  /  have  the  following  properties  (here  ©  is  the  concatenation  operator): 

1.  (monotonicity)  for  any  schedules  Si,  S2  G  S,  we  have  /(Si)  <  /(Si  ©  S2)  and  /(S2)  < 
/(SiffiS2). 

2.  (submodularity)  for  any  schedules  S1;  S2  G  S  and  any  action  a  G  V  x  M>0, 

f(S,  ©  S2  ©  (a))  -  /(S,  ©  S2)  <  /(S,  ©  (a))  -  /(SO  .  (1.1) 

We  will  evaluate  schedules  in  terms  of  two  objectives.  The  first  objective  is  to  maximize  /  (S) 
subject  to  the  constraint  £  (S)  <  T,  for  some  fixed  T  >  0,  where  i  (S)  equals  the  sum  of  the 
durations  of  the  actions  in  S.  For  example  if  S  =  ((iq,  3),  (u2,  3)),  then  £(S)  =  6.  We  refer  to  this 
problem  as  Budgeted  Maximum  Submodular  Coverage  (the  origin  of  this  terminology  is 
explained  in  i|2]). 

The  second  objective  is  to  minimize  the  cost  of  a  schedule,  which  we  define  as 

/»oo 

c  (/,  S)  —  1  -f(S{t))dt  (1.2) 

Jt= o 

where  S'm  is  the  schedule  that  results  from  truncating  schedule  S  at  time  t.  For  example  if  S  = 
((vi,  3),  (v2,  3))  then  S(5)  =  ((ui,  3),  (v2,  2)).1  One  way  to  interpret  this  objective  is  to  imagine 
that  f(S)  is  the  probability  that  some  desired  event  occurs  as  a  result  of  performing  the  actions  in 
S.  For  any  non-negative  random  variable  X,  we  have  E  [A"]  =  /  x()  P  [A"  >  t]  dt.  Thus  c  (/,  S ) 
is  the  expected  time  we  must  wait  before  the  event  occurs  if  we  execute  actions  according  to  the 

'More  formally,  if  S  =  (ai,  a2, . . .),  where  a*  =  (vt,  rj),  then  S(t'}  =  (ai,  a2) . . . ,  ak-i,  ak,  {vk+ i,  r')),  where  k 
is  the  largest  integer  such  that  )T/=1  t,  <  t  and  t’  =  t  -  Ylt=i  A- 
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schedule  S.  We  refer  to  the  problem  of  computing  a  schedule  that  minimizes  c  (/,  S)  as  Min-Sum 
SUBMODULAR  COVER. 

In  the  online  setting,  an  arbitrary  sequence  (/,,  /2, . . . ,  /„)  of  jobs  arrive  one  at  a  time,  and 
we  must  finish  each  job  (via  some  schedule)  before  moving  on  to  the  next  job.  When  selecting 
a  schedule  S)  to  use  to  finish  job  /),  we  have  knowledge  of  the  previous  jobs  /,,  /2, . . . , ./)_ i  but 
we  have  no  knowledge  of  /*  itself  or  of  any  subsequent  jobs.  In  this  setting  our  goal  is  to  develop 
schedule-selection  strategies  that  minimize  regret ,  which  is  a  measure  of  the  difference  between 
the  average  cost  (or  average  coverage)  of  the  schedules  produced  by  our  online  algorithm  and  that 
of  the  best  single  schedule  (in  hindsight)  for  the  given  sequence  of  jobs. 

The  following  example  illustrates  these  definitions. 

Example  1.  Let  each  activity  v  represent  a  randomized  algorithm  for  solving  some  decision  prob¬ 
lem,  and  let  the  action  (v,  r)  represent  running  the  algorithm  (with  a  fresh  random  seed)  for  time 
r.  Fix  some  particular  instance  of  the  decision  problem,  and  for  any  schedule  S,  let  f(S)  be  the 
probability  that  one  (or  more)  of  the  runs  in  the  sequence  S  yields  a  solution  to  that  instance.  So 
f(S{T))  is  (by  definition)  the  probability  that  performing  the  runs  in  schedule  S  yields  a  solution 
to  the  problem  instance  in  time  <  T,  while  c  (/,  S)  is  the  expected  time  that  elapses  before  a 
solution  is  obtained.  It  is  clear  that  f(S)  satisfies  the  monotonicity  condition  required  of  a  job, 
because  adding  runs  to  the  sequence  S  can  only  increase  the  probability  that  one  of  the  runs  is 
successful.  The  fact  that  /  is  submodular  can  be  seen  as  follows.  For  any  schedule  S  and  action  a, 
f(S  ©  (a))  —  f(S)  equals  the  probability  that  action  a  succeeds  after  every  action  in  S  has  failed, 
which  can  also  be  written  as  (1  —  f(S))  ■  /((a))-  This,  together  with  the  monotonicity  of  /,  implies 
that  for  any  schedules  5j ,  S2  and  any  action  a,  we  have 

/(Si  ©  52  ©  (a))  -  /(^  ©  S2)  =  (1  -  /(S,  ©  S2))  •  /((a)) 

<(1-/(5,)) -/((a)) 

=  /(S,  ©  (a))  -  /(SO 


so  /  is  submodular. 

1.2  Sufficient  conditions 

In  some  cases  of  practical  interest,  /  will  not  satisfy  the  submodularity  condition  but  will  still 
satisfy  weaker  conditions  that  are  sufficient  for  our  results  to  carry  through. 

In  the  offline  setting,  our  results  will  hold  for  any  function  /  that  satisfies  the  monotonicity 
condition  and,  additionally,  satisfies  the  following  condition  (we  prove  in  (J3]that  any  submodular 
function  satisfies  this  weaker  condition). 

Condition  1.  For  any  ,5) ,  S  e  S, 

f(S1®S)-f(S1)  ^  _  j"  / (5,  ©  ((i^t)))  / (5,) 

p  /  _  1X1  ctX  x 

l(S)  (v,r)eVx R>0  [  T 

Recall  that  £  ( S )  equals  the  sum  of  the  durations  of  the  actions  in  S.  Informally,  Condition  [I] 
says  that  the  increase  in  /  per  unit  time  that  results  from  performing  a  sequence  of  actions  S  is 
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always  bounded  by  the  maximum,  over  all  actions  (v,r),  of  the  increase  in  /  per  unit  time  that 
results  from  performing  that  action. 

In  the  online  setting,  our  results  will  apply  if  each  function  /,-  in  the  sequence  (/],  /2,  . . . , 
fn)  satisfies  the  monotonicity  condition  and,  additionally,  the  sequence  as  a  whole  satisfies  the 
following  condition  (we  prove  in  (J4]that  if  each  /,  is  a  job,  then  this  condition  is  satisfied). 

Condition  2.  For  any  sequence  Si,  S2, . . . ,  Sn  of  schedules  and  any  schedule  S, 


EiiiC m®s)-m)) 

ns) 


<  max 
(u,r)£Vxl>0 


This  generality  allows  us  to  handle  jobs  similar  to  the  job  defined  in  Example  [I]  but  where  an 
action  (y,  r)  may  represent  continuing  a  run  of  algorithm  v  for  an  additional  r  time  units  (rather 
than  running  v  with  a  fresh  random  seed).  Note  that  the  function  /  defined  in  Example  [T]  is  no 
longer  submodular  when  actions  of  this  form  are  allowed. 

1.3  Summary  of  results 

We  first  consider  the  offline  problems  Budgeted  Maximum  Submodular  Coverage  and 
Min-Sum  Submodular  Cover.  As  immediate  consequences  of  existing  results  [113111],  we 
find  that,  for  any  e  >  0,  (i)  achieving  an  approximation  ratio  of  4  —  e  for  Min-Sum  Submodular 
Cover  is  NP-hard  and  (ii)  achieving  an  approximation  ratio  of  1  —  \  +  e  for  Budgeted  Maxi¬ 
mum  Submodular  Coverage  is  NP-hard.  We  then  present  a  greedy  approximation  algorithm 
that  simultaneously  achieves  the  optimal  approximation  ratio  of  4  for  Min-Sum  Submodular 
Cover  and  the  optimal  approximation  ratio  of  1  —  |  for  Budgeted  Maximum  Submodular 
Coverage,  building  on  and  generalizing  previous  work  on  special  cases  of  these  two  problems 

mm. 

The  main  contribution  of  this  paper,  however,  is  to  address  the  online  setting.  In  this  set¬ 
ting  we  provide  an  online  algorithm  whose  worst-case  performance  approaches  that  of  the  of¬ 
fline  greedy  approximation  algorithm  asymptotically  (as  the  number  of  jobs  approaches  infinity). 
More  specifically,  we  analyze  the  online  algorithm’s  performance  in  terms  of  “cc -regret”.  For  the 
cost-minimization  objective,  q- regret  is  defined  as  the  difference  between  the  average  cost  of  the 
schedules  selected  by  the  online  algorithm  and  a  times  the  average  cost  of  the  optimal  schedule 
for  the  given  sequence  of  jobs.  For  the  coverage-maximization  objective,  o-rcgrct  is  the  difference 
between  a  times  the  average  coverage  of  the  optimal  fixed  schedule  and  the  average  coverage  of 
the  schedules  selected  by  the  online  algorithm.  For  the  objective  of  minimizing  cost,  the  online 
algorithm’s  4-regret  approaches  zero  as  n  — >  oo,  while  for  the  objective  of  maximizing  coverage, 
its  1  —  ^  regret  approaches  zero  as  n  — >  oo.  Assuming  P  ^  NP,  these  guarantees  are  essentially 
the  best  possible  among  online  algorithms  that  make  decisions  in  polynomial  time. 

Our  online  algorithms  can  be  used  in  several  different  feedback  settings.  We  first  consider  the 
feedback  setting  in  which,  after  using  schedule  St  to  complete  job  /j,  we  receive  complete  access 
to  fi.  We  then  consider  more  limited  feedback  settings  in  which:  (/)  to  receive  access  to  /,  we  must 
pay  a  price  C,  which  is  added  to  the  regret,  (ii)  we  only  observe  f,  for  each  t  >  0,  and  (Hi) 

we  only  observe  /,  (5)). 
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We  also  prove  tight  information- theoretic  lower  bounds  on  1 -regret,  and  discuss  exponential 
time  online  algorithms  whose  regret  matches  the  lower  bounds  to  within  logarithmic  factors.  Inter¬ 
estingly,  these  lower  bounds  also  match  the  upper  bounds  from  our  online  greedy  approximation 
algorithm  up  to  logarithmic  factors,  although  the  latter  apply  to  o-rcgrct  (for  a  =  4  or  a  =  1  —  1) 
rather  than  1 -regret. 

1.4  Problems  that  fit  into  this  framework 

We  now  discuss  how  a  number  of  previously-studied  problems  fit  into  our  framework. 

1.4.1  Special  cases  of  Budgeted  Maximum  Submodular  Coverage 

The  Budgeted  Maximum  Submodular  Coverage  problem  introduced  in  this  paper  is  a 
slight  generalization  of  the  problem  of  maximizing  a  monotone  submodular  set  function  subject 
to  a  knapsack  constraint  [21,  29j.  The  only  difference  between  the  two  problems  is  that,  in  the 
latter  problem,  /(S')  may  only  depend  on  the  set  of  actions  in  the  sequence  S',  and  not  on  the  order 
in  which  the  actions  appear.  The  problem  of  maximizing  a  monotone  submodular  set  function 
subject  to  a  knapsack  constraint  in  turn  generalizes  Budgeted  Maximum  Coverage  09l. 
which  generalizes  Max  /c-Coverage  Il26ll. 

1.4.2  Special  cases  of  Min-Sum  Submodular  Cover 

The  Min-Sum  Submodular  Cover  problem  introduced  in  this  paper  generalizes  several  previously- 
studied  problems,  including  Min-Sum  Set  Cover  (Til,  Pipelined  Set  Cover  0711231,  the 
problem  of  constructing  efficient  sequences  of  trials  O,  and  the  problem  of  constructing  restart 
schedules  041123,  28j.  Specifically,  these  problems  can  be  represented  in  our  framework  by  jobs 
of  the  form 


This  expression  can  be  interpreted  as  follows:  the  job  /  consists  of  n  subtasks,  and  pi(v,  r)  is  the 
probability  that  investing  time  r  in  activity  v  completes  the  ith  subtask.  Thus,  f(S)  is  the  expected 
fraction  of  subtasks  that  are  finished  after  performing  the  sequence  of  actions  in  S.  Assuming 
Pi(v,  t)  is  a  non-decreasing  function  of  r  for  all  i  and  v,  it  can  be  shown  that  any  function  /  of  this 
form  satisfies  the  monotonicity  and  submodularity  properties  required  of  a  job.  In  the  special  case 
n  =  1,  this  follows  from  Example  |T]  In  the  general  case  n  >  1,  this  follows  from  the  fact  (which 
follows  immediately  from  the  definitions)  that  any  convex  combination  of  jobs  is  a  job. 

The  problem  of  computing  restart  schedules  places  no  further  restrictions  on  Pi(v,  r).  Pipelined 
Set  Cover  is  the  special  case  in  which  for  each  activity  v  there  is  an  associated  time  r„ ,  and 
Pi(v,r )  =  1  if  t  >  rv  and  Pi(v,r)  =  0  otherwise.  Min-Sum  Set  Cover  is  the  special  case 
in  which,  additionally,  tv  —  1  or  tv  —  oo  for  all  v  e  V.  The  problem  of  constructing  efficient 
sequences  of  trials  corresponds  to  the  case  in  which  we  are  given  a  matrix  q,  and  pt{v,  r)  =  qVji  if 
r  >  1  and  Pi(v,r )  =  0  otherwise. 


/  ((K,  n),  {v2,  r2),  ■  ■  ■ ,  {vL,  tl)))  =  -  ^2 

n  ^ 


i= 1 


II  (X  ~Pi(vhTl)) 


(1.3) 


i=i 
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1.5  Applications 

We  now  discuss  applications  of  the  results  presented  in  this  paper.  The  first  application,  “Com¬ 
bining  multiple  heuristics  online”,  is  evaluated  experimentally  in  ||6j  Evaluating  the  remaining 
applications  is  an  interesting  area  of  future  work. 

1.5.1  Combining  multiple  heuristics  online 

An  algorithm  portfolio  Ifl5il  is  a  schedule  for  interleaving  the  execution  of  multiple  (randomized) 
algorithms  and  periodically  restarting  them  with  a  fresh  random  seed.  Previous  work  has  shown 
that  combining  multiple  heuristics  for  NP-hard  problems  into  a  portfolio  can  dramatically  reduce 
average-case  running  time  [l2l  [15,  27|.  In  particular,  algorithms  based  on  chronological  back¬ 
tracking  often  exhibit  heavy-tailed  run  length  distributions,  and  periodically  restarting  them  with  a 
fresh  random  seed  can  reduce  the  mean  running  time  by  orders  of  magnitude  [|T3ll.  Our  algorithms 
can  be  used  to  leam  an  effective  algorithm  portfolio  online,  in  the  course  of  solving  a  sequence  of 
problem  instances. 


1.5.2  Database  query  optimization 


In  database  query  processing,  one  must  extract  all  the  records  in  a  database  that  satisfy  every 
predicate  in  a  list  of  one  or  more  predicates  (the  conjunction  of  predicates  comprises  the  query). 
To  process  the  query,  each  record  is  evaluated  against  the  predicates  one  at  a  time  until  the  record 
either  fails  to  satisfy  some  predicate  (in  which  case  it  does  not  match  the  query)  or  all  predicates 
have  been  examined.  The  order  in  which  the  predicates  are  examined  affects  the  time  required  to 
process  the  query.  Munagala  et  al.  Il25ll  introduced  and  studied  a  problem  called  Pipelined  Set 
Cover,  which  entails  finding  an  evaluation  order  for  the  predicates  that  minimizes  the  average 


time  required  to  process  a  record.  As  discussed  in  f  1.4  Pipelined  Set  Cover  is  a  special  case 
of  Min-Sum  Submodular  Cover.  In  the  online  version  of  Pipelined  Set  Cover,  records 
arrive  one  at  a  time  and  one  may  select  a  different  evaluation  order  for  each  record.  In  our  terms, 
the  records  are  jobs  and  predicates  are  activities. 


1.5.3  Sensor  placement 


Sensor  placement  is  the  task  of  assigning  locations  to  a  set  of  sensors  so  as  to  maximize  the  value 
of  the  information  obtained  (e.g.,  to  maximize  the  number  of  intrusions  that  are  detected  by  the 
sensors).  Many  sensor  placement  problems  can  be  optimally  solved  by  maximizing  a  monotone 
submodular  set  function  subject  to  a  knapsack  constraint  |[2()1.  As  discussed  in  f.  1.4  this  problem 
is  a  special  case  of  Budgeted  Maximum  Submodular  Coverage.  Our  online  algorithms 
could  be  used  to  select  sensor  placements  when  the  same  set  of  sensors  is  repeatedly  deployed  in 
an  unknown  or  adversarial  environment. 
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1.5.4  Viral  marketing 


Viral  marketing  infects  a  set  of  agents  (e.g.,  individuals  or  groups)  with  an  advertisement  which 
they  may  pass  on  to  other  potential  customers.  Under  a  standard  model  of  social  network  dynamics, 
the  total  number  of  potential  customers  that  are  influenced  by  the  advertisement  is  a  submodular 
function  of  the  set  of  agents  that  are  initially  infected  (T8l.  Previous  work  [fl8l  gave  an  algorithm 
for  selecting  a  set  of  agents  to  initially  infect  so  as  to  maximize  the  influence  of  an  advertisement, 
assuming  the  dynamics  of  the  social  network  are  known.  In  theory,  our  online  algorithms  could  be 
used  to  adapt  a  marketing  campaign  to  unknown  or  time-varying  social  network  dynamics. 


2  Related  Work 

As  discussed  in  §1.4[  the  Min-Sum  Submodular  Cover  problem  introduced  in  this  paper  gen¬ 
eralizes  several  previously-studied  problems,  including  Min-Sum  Set  Cover  IfTTI.  Pipelined 
Set  Cover  IfTTI  251.  the  problem  of  constructing  efficient  sequences  of  trials  0,  and  the  problem 
of  constructing  restart  schedules  If23l  14.  281. 

Several  of  these  problems  have  been  considered  in  the  online  setting.  Munagala  et  al.  (25l 
gave  an  online  algorithm  for  Pipelined  Set  Cover  whose  O  (log  |V|)-regret  is  o  (n),  where  n 
is  the  number  of  records  (jobs).  Babu  et  al.  [0]  and  Kaplan  et  al.  IfTTI  gave  online  algorithms 
for  Pipelined  Set  Cover  whose  4-regret  is  o(?i),  but  these  bounds  hold  only  in  the  special 
case  where  the  jobs  are  drawn  independently  at  random  from  a  fixed  probability  distribution.  The 
online  setting  in  this  paper,  where  the  sequence  of  jobs  may  be  arbitrary,  is  more  challenging  from 
a  technical  point  of  view. 

As  already  mentioned,  Budgeted  Maximum  Submodular  Coverage  generalizes  the 
problem  of  maximizing  a  monotone  submodular  set  function  subject  to  a  knapsack  constraint. 
Previous  work  gave  offline  greedy  approximation  algorithms  for  this  problem  1 21]  [2911,  which  gen¬ 
eralized  earlier  algorithms  for  Budgeted  Maximum  Coverage  OH  and  Max  k- Coverage 
(261.  To  our  knowledge,  none  of  these  three  problems  have  previously  been  studied  in  an  online 
setting. 

It  is  worth  pointing  out  that  the  online  problems  we  consider  here  are  quite  different  from 
online  set  cover  problems  that  require  one  to  construct  a  single  collection  of  sets  that  cover  each 
element  in  a  sequence  of  elements  that  arrive  online  (11 01 .  Likewise,  our  work  is  orthogonal  to 
work  on  online  facility  location  problems  (24l . 

The  main  technical  contribution  of  this  paper  is  to  convert  some  specific  greedy  approximation 
algorithms  into  online  algorithms.  Recently,  Kakade  et  al.  (T6l  gave  a  generic  procedure  for  con¬ 
verting  an  a -approximation  algorithm  for  a  linear  problem  into  an  online  algorithm  whose  a- regret 
is  o  (n),  and  this  procedure  could  be  applied  to  the  problems  considered  in  this  paper.  However, 
both  the  running  time  of  their  algorithm  and  the  resulting  regret  bounds  depend  on  the  dimension 
of  the  linear  problem,  and  a  straightforward  application  of  their  algorithm  leads  to  running  time 
and  regret  bounds  that  are  exponential  in  |V|. 
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3  Offline  Algorithms 

In  this  section  we  consider  the  offline  problems  Budgeted  Maximum  Submodular  Cover¬ 
age  and  Min-Sum  Submodular  Cover.  In  the  offline  setting,  we  are  given  as  input  a  job 
/  :  S  — >  [0, 1].  Our  goal  is  to  compute  a  schedule  S  that  achieves  one  of  two  objectives:  for 
Budgeted  Maximum  Submodular  Coverage,  we  wish  to  maximize  f(S)  subject  to  the 
constraint  i  ( S )  <  T  (for  some  fixed  T  >  0),  while  for  Min-Sum  Submodular  Cover,  we 
wish  to  minimize  the  cost  c  (/,  S). 

The  offline  algorithms  presented  in  this  section  will  serve  as  the  basis  for  the  online  algorithms 
we  develop  in  the  next  section. 

Note  that  we  have  defined  the  offline  problem  in  terms  of  optimizing  a  single  job.  However, 
given  a  set  {/i,  /2, . . . ,  /n},  we  can  optimize  average  schedule  cost  (or  coverage)  by  applying  our 
offline  algorithm  to  the  job  /  =  f  Ym=i  /*  (as  already  mentioned,  any  convex  combination  of  jobs 
is  a  job). 


3.1  Computational  complexity 


Both  of  the  offline  problems  considered  in  this  paper  are  NP-hard  even  to  approximate.  As  dis¬ 
cussed  in  f[i~4]  Min-Sum  Submodular  Cover  generalizes  Min-Sum  Set  Cover,  and  Bud¬ 
geted  Maximum  Submodular  Coverage  generalizes  Max  k- Coverage.  In  a  classic  pa¬ 
per,  Feige  proved  that  for  any  e  >  0,  acheiving  an  approximation  ratio  of  1  —  f  +  e  for  Max 
^-Coverage  is  NP-hard  lUOl.  Recently,  Feige,  Lovasz,  and  Tetali  ifTTll  introduced  Min-Sum 
Set  Cover  and  proved  that  for  any  e  >  0,  achieving  a  4  —  e  approximation  ratio  for  Min-Sum 
Set  Cover  is  NP-hard.  These  observations  immediately  yield  the  following  theorems. 


Theorem  1.  For  any  e  >  0,  achieving  a  1  —  f  +  e  approximation  ratio  for  Budgeted  Maximum 
Submodular  Coverage  is  NP-hard. 


Theorem  2.  For  any  e  >  0,  achieving  a  4  —  e  approximation  ratio  for  Min-Sum  Submodular 
Cover  is  NP-hard. 


3.2  Greedy  approximation  algorithm 


In  this  section  we  present  a  greedy  approximation  algorithm  that  can  be  used  to  achieve  a  4  ap¬ 
proximation  for  Min-Sum  Submodular  Cover  and  a  1  —  -e  approximation  for  Budgeted 
Maximum  Submodular  Coverage.  By  Theorems [T] and [2j  achieving  a  better  approximation 
ratio  for  either  problem  is  NP-hard. 

Consider  the  schedule  defined  by  the  following  simple  greedy  rule.  Let  G  —  (c/i,  g2, . . .)  be  the 
schedule  defined  inductively  as  follows:  G\  =  (),  Gj  =  (gi,  g2,  •  •  • ,  gj-i)  for  j  >  1,  and 


9j 


argmax  (  L  &»*  «<V Z  L  (gif 

(ri,r)eVxM>0  l  7” 


(3.1) 


That  is,  G  is  constructed  by  greedily  appending  an  action  (v,  r)  to  the  schedule  so  as  to  maximize 
the  resulting  increase  in  /  per  unit  time. 
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Once  we  reach  a  j  such  that  f(Gj)  =  1,  we  may  stop  adding  actions  to  the  schedule.  In 
general,  however,  G  may  contain  an  infinite  number  of  actions.  For  example,  if  each  action  (v,  r) 
represents  running  a  Las  Vegas  algorithm  v  for  time  r  and  /(S)  is  the  probability  that  any  of  the 
runs  in  S  return  a  solution  to  some  problem  instance  (see  Example  [I]),  it  is  possible  that  m  <  i 
for  any  finite  schedule  S.  The  best  way  of  dealing  with  this  is  application-dependent.  In  the  case 
of  Example[lJ  we  might  stop  computing  G  when  f(Gj )  >1  —  5  for  some  small  8  >  0. 

The  time  required  to  compute  G  is  also  application-dependent.  In  the  applications  of  interest  to 
us,  evaluating  the  arg  max  in  (|3.1[)  will  only  require  us  to  consider  a  finite  number  of  actions  (v,  r). 
In  some  cases,  the  evaluation  of  the  arg  max  in  (|3.1[)  can  be  sped  up  using  application-specific  data 
structures. 

As  mentioned  in  { 1.2[  our  analysis  of  the  greedy  approximation  algorithm  will  only  require 
that  /  is  monotone  and  that  /  satisfies  Condition |T|  The  following  lemma  shows  that  if  /  is  a  job, 
then  /  also  satisfies  these  conditions. 


Lemma  1 .  If  f  satisfies  (|1.1|),  then  f  satisfies  Condition [7]  That  is,  for  any  schedules  Si,  S  G  S, 
we  have 

f{Sx®S)~f{Si)  /  _  j7(Si©((u,r)))-/(Si) 

o  /  r~i\  _  IXLcLX  S 

£(*S)  (v,r)eVxR>0  {  T 

Proof.  Let  r  denote  the  right  hand  side  of  the  inequality.  Let  S  =  (oi,  a2, . . . ,  afi),  where  eq  = 


(vi,ti).  Let 

Ai  =  /(Si  ©  (a i,  02, ... ,  a/))  —  /(Si  ©  (oi,  a2, . 

•  •  ,  Oj-i))  • 

We  have 

L 

AS,  ®  S)  =  /(SO  +  J2 

1=1 

(telescoping  series) 

L 

1=1 

(submodularity) 

L 

<  f(S i)  +  J2r'Ti 

1  —  1 

(definition  of  r) 

L  —  _L 

—  /(Si)  +  r  ■  £  (S)  . 

Rearranging  this  inequality  gives  <  r,  as  claimed.  □ 

The  key  to  the  analysis  of  the  greedy  approximation  algorithm  is  the  following  fact,  which  is 
the  only  property  of  G  that  we  will  use  in  our  analysis. 

Fact  1.  For  any  schedule  S,  any  positive  integer  j,  and  any  t  >  0,  we  have 


f{S{t))<f{Gj)  +  t-sj 

where  Sj  is  the  jth  value  of  the  maximum  in  (|3.1|). 

Fact[l]holds  because /(S(t))  <  f{Gj®S^)  by  monotonicity,  while  f(Gj®S^)  <  f(Gj)+t-Sj 
by  Condition  [T]  and  the  definition  of  s3 . 
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3.2.1  Maximizing  coverage 


We  first  analyze  the  performance  of  the  greedy  algorithm  on  the  Budgeted  Maximum  Sub- 
modular  Coverage  problem.  The  following  theorem  shows  that,  for  certain  values  of  T,  the 
greedy  schedule  achieves  the  optimal  approximation  ratio  of  1  —  \  for  this  problem.  The  proof  of 
the  theorem  is  similar  to  arguments  in  {21,  29]. 

Theorem  3.  Let  L  be  a  positive  integer,  and  let  T  =  Y^=iTj>  where  gj  =  ( Vj,Tj ).  Then 

f  (Gm)  >  (1  -  \)  maxSe5  {/  (V>)}- 

Proof.  Let  C*  =  maxSe5  {/  (S(T)) },  and  for  any  positive  integer  j,  let  A j  =  C*  —  f  (Gj).  By 
Fact [TJ  C*  <  f  ( Gj )  +  TSj.  Thus 


Aj  <  Tsj 


Rearranging  this  inequality  gives  Aj+1  <  A;  (l  —  f).  Unrolling  this  inequality,  we  get 


Al+i  < 


Ai 


n> 

0=1 


Subject  to  the  constraint  ff1.  ,  t:I  =  T,  the  product  series  is  maximized  when  t;)  =  ^  for  all  j. 


Thus  we  have 


c*-f  (Gl+1)  =  Al+1  <  A,  I  1  -  ±r)  <A1-<C*~. 


Thus  /  (Gl+i)  >  (1  —  l)C*,  as  claimed.  □ 

Theorem  [3]  shows  that  G  gives  a  1  —  ^  approximation  to  the  problem  of  maximizing  coverage 
at  time  T,  provided  that  T  equals  the  sum  of  the  durations  of  the  actions  in  Gj  for  some  positive 
integer  j.  Under  the  assumption  that  /  is  a  job  (as  opposed  to  the  weaker  assumption  that  /  satisfies 
Condition  [TJ,  the  greedy  algorithm  can  be  combined  with  the  partial  enumeration  approach  of 
Kliuller  el  al.  1 191]  to  achieve  a  l  -  |  approximation  ratio  for  any  fixed  T.  The  idea  of  this  approach 
is  to  guess  a  sequence  Y  =  (01,  a2,  a3)  of  three  actions,  and  then  run  the  greedy  algorithm  on  the 
job  f'(S)  —  f  (Y  ©  S)  —  f  (Y)  with  budget  T  —  T0,  where  T0  is  the  total  time  consumed  by  the 
actions  in  Y.  The  arguments  of  [19,  291  show  that,  for  some  choice  of  Y,  this  yields  a  (l  —  ^)- 
approximation.  In  order  for  this  approach  to  be  feasible,  actions  must  have  discrete  durations,  so 
that  the  number  of  possible  choices  of  Y  is  finite. 


3.2.2  Minimizing  cost 

We  next  analyze  the  performance  of  the  greedy  algorithm  on  the  Min-Sum  Submodular  Cover 
problem.  The  following  theorem  uses  the  proof  technique  of  ][TT|  to  show  that  the  greedy  schedule 
G  has  cost  at  most  4  times  that  of  the  optimal  schedule,  generalizing  results  of  111,  17,  23.  2T.  281. 
As  already  mentioned,  achieving  a  better  approximation  ratio  is  NP-hard. 
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Theorem  4.  c  (/,  G)  <  4  /“0  1  -  rnaxSe5  {/  (S{t))  }  dt  <  4  minSeS  c  (/,  S). 

Proof.  Let  Rj  =  1  —  /  (Gj);  let  Xj  =  let  y j  =  and  let  h(x)  =  1  —  maxs  {/  (S1^)  }.  By 

Fact|T| 

max  {/(5,(*i))}  <  /  (Gj)  +  XjSj  =  /  (Gf  +  y  . 

Thus  h(xj)  >  Rj  —  Z  =  yj.  The  monotonicity  of  /  implies  that  h(x)  is  non-increasing  and  also 
that  the  sequence  (2/1 , 2/2,  • . .)  is  non-increasing.  As  illustrated  in  Figure  [lj  these  facts  imply  that 

fZo  h(x )  dx  -  Ej>  1  xj  ( Vj  -  Vj+i)- Thus  we  have 


max 

SeS 


{/(«<«>)}* 


h(x)  dx 


'  x=0 


>  {'d3-y]+\  ) 

3>  1 


1  D  (RJ  -  Ri+ i) 

j>  1  J 

J>] 

>  jc  (/,  G) 


(Figure  |T|) 


(monotonicity  of  /) 


which  proves  the  theorem. 


□ 


x 


Figure  1:  An  illustration  of  the  inequality  fZo  Hx)  dx  —  zL;  >i  xj  (Vj  ~  Uj+ 1)-  The  left  hand 
side  is  the  area  under  the  curve,  whereas  the  right  hand  side  is  the  sum  of  the  areas  of  the  shaded 
rectangles. 
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3.2.3  A  refined  greedy  approximation  algorithm 

A  drawback  of  G  is  that  it  greedily  chooses  an  action  gj  =  (v.  r)  that  maximizes  the  marginal 
increase  in  /  divided  by  r,  whereas  the  contribution  of  (v.  r)  to  the  cost  of  G  is  not  r  but  rather 


1  ~  f  (Gj  ®  ((■ v,t)))  dt . 


This  can  lead  G  to  perform  suboptimally  even  in  seemingly  easy  cases.  To  see  this,  let  V  = 
{vi,V2},  let  Sj  =  ((di,  £)),  and  let  Sf  =  ((v2,  £)).  Let  /  be  a  job  defined  by 


f(Sl) 


1  if  t  >  1 
0  otherwise 


whereas 

f(St )  =  min  {1,  t}  . 

For  any  schedule  S  =  (ai,  a2, ....  aj)  containing  more  than  one  action,  let  f(S)  =  lnaxj;,  f((ai)). 
It  is  straightforward  to  check  that  /  satisfies  the  monotonicity  and  submodularity  conditions  re¬ 
quired  of  a  job. 

Here  the  optimal  schedule  is  S*  =  ((v2, 1)),  with  cost  c  (/,  S*)  =  f  Q  1  —  t  dt  =  However, 
if  ties  in  the  evaluation  of  the  arg  max  in  (|3. 1[)  are  broken  appropriately,  the  greedy  algorithm  will 
choose  the  schedule  G  =  ((iq,  1)),  with  cost  c  (/,  G)  =  1. 

To  improve  performance  in  cases  such  as  this,  it  is  natural  to  consider  the  schedule  G'  = 
{g[,  g'2, . . .)  defined  inductively  as  follows:  G(  =  {g[,  g'2, . . . ,  g(_x }  and 


g'j  =  arg  max 

(i),r)eVxM>o 


/(^.®((u,t)))-/(g;.)  1 
lt=0  1  -  f  (Gj  0  ((u^)))  dt  J 


(3.2) 


Theorem  [5]  shows  that  G'  achieves  the  same  approximation  ratio  as  G.  The  proof  is  similar  to 
the  proof  of  Theorem  [4j  and  is  given  in  Appendix  A. 

Theorem  5.  c  (/,  G')  <  4  /“Q  1  -  max5e5  {/  (S{t))  }  dt  <  4  minSeS  {c  (/,  3)}. 

Furthermore,  it  can  be  shown  that,  in  contrast  to  G,  C  is  optimal  in  the  important  special  case 
when  V  =  {n},  action  (v,r)  represents  running  a  Las  Vegas  algorithm  v  (with  a  fresh  random 
seed)  for  time  r,  and  f(S)  equals  the  probability  that  at  least  one  of  the  runs  in  S  returns  a  solution 
to  some  particular  problem  instance  (as  described  in  Example  [T]). 


3.2.4  Handling  non-uniform  additive  error 


We  now  consider  the  case  in  which  the  jth  decision  made  by  the  greedy  algorithm  is  performed 
with  some  additive  error  er  This  case  is  of  interest  for  two  reasons.  First,  in  some  cases  it  may 
not  be  practical  to  evaluate  the  arg  max  in  (|3. 1  [)  exactly.  Second,  and  more  importantly,  we  will 
end  up  viewing  our  online  algorithm  as  a  version  of  the  offline  greedy  algorithm  in  which  each 
decision  is  made  with  some  additive  error.  In  this  section  we  analyze  the  original  greedy  schedule 
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G  as  opposed  to  the  refined  schedule  G'  described  in  the  previous  section,  because  it  is  the  original 
schedule  G  that  will  form  the  basis  of  our  online  algorithm  (as  we  discuss  further  in  £j5J  devising 
an  online  algorithm  based  on  G'  is  an  interesting  open  problem). 

We  denote  by  G  =  (<71,  <72,  •  ■  ■)  a  variant  of  the  schedule  G  in  which  the  jth  arg  max  in  (|3.1[)  is 
evaluated  with  additive  error  ej.  More  formally,  G  is  a  schedule  that,  for  any  3>  1,  satisfies 


/  (<2)  ®  Si)  ~  J  (Gj) 


>  max 
(t),r)eVxK>0 


/  (Gj  ©  ({v, t)))  —  f  {Gj) 


T 


(3.3) 


where  G0  =  (),  Gj  =  (gi,g2,  ■  •  • ,  gj- 1)  for  j  >  1,  and  gj  =  (v3,  tj). 

The  following  two  theorems  summarize  the  performance  of  G.  The  proofs  are  given  in  Ap¬ 
pendix  A,  and  are  along  the  same  lines  as  that  those  of  theorems  [3]  and  [4] 

Theorem  6.  Let  L  be  a  positive  integer,  and  let  T  =  V  ;|  fJt  where  g3  =  (bj,  fj).  Then 

f  (£<t>)  >  f1  ~  r^a#  {/  (Sm)  }  ~  J2  ■ 

S  /  j=i 

Theorem  7.  Let  L  be  a  positive  integer,  and  let  T  =  fr  where  gj  =  (vv  fj).  For  any 

schedule  S,  define  cT  (/,  S)  =  J'^0  1  —  /  (S'(t))  dt.  Then 

roo  L 

CT  (f,G)  <  4  /  1  -  max  {/  (5(t>) }  dt  +  J^  Ejfj  ■ 

Jt= 0 


where  Ej  =  Y^i<j  eiri- 

4  Online  Algorithms 

In  this  section  we  consider  the  online  versions  of  Budgeted  Maximum  Submodular  Cov¬ 
erage  and  Min-Sum  Submodular  Cover.  In  the  online  setting  we  are  fed,  one  at  a  time,  a 
sequence  (/1,  /2, . . . ,  fn)  of  jobs.  Prior  to  receiving  job  /),  we  must  specify  a  schedule  S).  We  then 
receive  complete  access  to  the  function  ft.  We  measure  the  performance  of  our  online  algorithm 
using  two  different  notions  of  regret.  For  the  cost  objective,  our  goal  is  to  minimize  the  4-regret 

n  (  n  >! 

Rcost  =  ^2  CT  ( Si ,  fi)  -  4  ■  min  <  c  (S, ./))  > 
i= 1  l  %=l  ) 

for  some  fixed  T  >  0.  Here,  for  any  schedule  S  and  job  /,  we  define  cT  (S,  f )  =  J/,l)  1  —  f  (S(t))  dt 
to  be  the  value  of  c  (S,  f )  when  the  integral  is  truncated  at  time  T.  Some  form  of  truncation  is  nec¬ 
essary  because  c  (S),  f)  could  be  infinite,  and  without  bounding  it  we  could  not  prove  any  finite 
bound  on  regret  (our  regret  bounds  will  be  stated  as  a  function  of  T). 
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For  the  objective  of  maximizing  the  coverage  at  time  T,  our  goal  is  to  minimize  the  (1  —  f )- 
regret 

Rcoverage  =  ^  ^  '  /*  (^{T})  l  ^  '  fi  (Rf) 

where  we  require  that  E  \t  (S'*)]  =  T,  in  expectation  over  the  online  algorithm’s  random  bits.  In 
other  words,  we  allow  the  online  algorithm  to  treat  T  as  a  budget  in  expectation,  rather  than  a  hard 
budget. 

Our  goal  is  to  bound  the  expected  values  of  Rcost  (resp.  Rcoverage )  on  the  worst-case  sequence 
of  n  jobs.  We  consider  the  so-called  oblivious  adversary  model ,  in  which  the  sequence  of  jobs  is 
fixed  in  advance  and  does  not  change  in  response  to  the  decisions  made  by  our  online  algorithm, 
although  we  believe  our  results  can  be  readily  extended  to  the  case  of  adaptive  adversaries.  Note 
that  the  constant  of  4  in  the  definition  of  Rcost  and  the  constant  of  1  —  -  in  the  definition  of  Rr 


u coverage 


stem  from  the  NP-hardness  of  the  corresponding  offline  problems,  as  discussed  in  13.1 


For  the  purposes  of  the  results  in  this  section,  we  confine  our  attention  to  schedules  that  consist 
of  actions  that  come  from  some  finite  set  A,  and  we  assume  that  the  actions  in  A  have  integer 
durations  (i.e.  A  C  V  x  Z>0).  Note  that  this  is  not  a  serious  limitation,  because  real-valued  action 
durations  can  always  be  discretized  at  whatever  level  of  granularity  is  desired. 

As  mentioned  in  1  L2j  our  results  in  the  online  setting  will  hold  for  any  sequence  (fi,  /2, . . . ,  fn) 
of  functions  that  satisfies  Condition  |2j  The  following  lemma  shows  that  any  sequence  of  jobs  sat¬ 
isfies  this  condition.  The  proof  follows  along  the  same  lines  as  the  proof  of  Lemma  [I]  and  is  given 
in  Appendix  A. 

Lemma  2.  Any  sequence  (/j,  /2, . . . ,  fn)  of  jobs  satisfies  Condition  [2]  That  is,  for  any  sequence 
S\ ,  S-2, . . . ,  Sn  of  schedules  and  any  schedule  S, 


Eh  fi(Si  e  s)  -  m) 
£(S) 


< 


max 

'»,t)6VxR>o 


4.1  Background:  the  experts  problem 

In  the  experts  problem,  one  has  access  to  a  set  of  k  experts,  each  of  whom  gives  out  a  piece  of 
advice  every  day.  On  each  day  i,  one  must  select  an  expert  et  whose  advice  to  follow.  Following 
the  advice  of  expert  j  on  day  %  yields  a  reward  xf  At  the  end  of  day  i,  the  value  of  the  reward  x* 
for  each  expert  j  is  made  public,  and  can  be  used  as  the  basis  for  making  choices  on  subsequent 
days.  One’s  regret  at  the  end  of  n  days  is  equal  to 

{n  ^  n 

y  x)  >  -  v  xi. . 

Note  that  the  historical  performance  of  an  expert  does  not  imply  any  guarantees  about  its  future 
performance.  Remarkably,  randomized  decision-making  algorithms  nevertheless  exist  whose  re¬ 
gret  grows  sub-linearly  in  the  number  of  days.  By  picking  experts  using  such  an  algorithm,  one  can 
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guarantee  to  obtain  (asymptotically  as  n  — >  oo)  an  average  reward  that  is  as  large  as  the  maximum 
reward  that  could  have  been  obtained  by  following  the  advice  of  any  fixed  expert  for  all  n  days. 

In  particular,  for  any  fixed  value  of  Gmax,  where  Gmax  =  maxi<j<fc  {EILi*5f  the  random¬ 
ized  weighted  majority  algorithm  (WMR)  Il22l  can  be  used  to  achieve  worst-case  regret  O  ( \fGmax  In  k) . 
If  Gmax  is  not  known  in  advance,  a  putative  value  can  be  guessed  and  doubled  to  achieve  the  same 
guarantee  up  to  a  constant  factor. 

4.2  Unit-cost  actions 

In  the  special  case  in  which  each  action  takes  unit  time  (i.e.,  A  C  V  x  {1}),  our  online  algorithm 
OGunit  is  very  simple.  OGunit  runs  T  experts  algorithms:2  £\ ,  £2,  •  •  • ,  £t,  where  T  is  the  number 
of  time  steps  for  which  our  schedule  is  defined.  The  set  of  experts  is  A.  Just  before  job  f,  arrives, 
each  experts  algorithm  £t  selects  an  action  a).  The  schedule  used  by  OGun;t  on  job  f,  is  St  = 

(aj,  al2, ... ,  alT).  The  payoff  that  £t  associates  with  action  a  is  /,  (Si(t_ p  ©  a)  —  /*  p). 


Algorithm  OGun;t 

Input:  integer  T,  experts  algorithms  £} ,  £2, . . . ,  £r- 
For  i  from  1  to  n: 

1.  For  each  t,  1  <  t  <  T,  use  £t  to  select  an  action  a\. 

2.  Select  the  schedule  St  —  (a\,al2,...,  alT). 

3.  Receive  the  job  f%. 

4.  For  each  t,  1  <  t  <  T,  and  each  action  a  G  A,  feed  back 

fi  {Siu_ i)  ©  a)  —  fi  as  the  payoff  £t  would  have  received 

by  choosing  action  a. 

Let  rt  be  the  regret  experienced  by  experts  algorithm  £t  when  running  OGunit,  and  let  R  = 
Zhn.  The  key  to  the  analysis  of  OGun;t  is  the  following  lemma,  which  relates  the  regret 
experienced  by  the  experts  algorithms  to  the  regret  on  the  original  online  problem. 

Lemma  3.  RCOverage  <  R  and  Rcost  <TR. 

Proof.  We  will  view  OGunit  as  producing  an  approximate  version  of  the  offline  greedy  schedule 
for  the  function  /  =  f  Y%=  i  fi-  First,  view  the  sequence  of  actions  selected  by  £t  as  a  single 
“meta-action”  at,  and  extend  the  domain  of  each  f  ,  to  include  the  meta-actions  by  defining  f  ,  (S  © 
at)  =  f,(S  ©  aj)  for  all  S  e  S.  Thus,  the  online  algorithm  produces  a  single  schedule  S  = 
(di,  a2,  •  •  • ,  dr)  for  all  i.  By  construction, 

n  =  S  {f  ®  a)  -  /  (^<t-i>)  }  -  (/  (fyt- 1)  (4-i>))  • 

2In  general,  Ei  .  Et  will  be  T  distinct  copies  of  a  single  experts  algorithm,  such  as  randomized  weighted 

majority. 
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Thus  OGunit  behaves  exactly  like  the  greedy  schedule  G  for  the  function  /,  where  the  tth  decision 
is  made  with  additive  error  — . 

n 

Furthermore,  the  fact  that  the  sequence  (/j,  /2, . . . ,  fn )  satisfies  Condition  [2]  implies  that  for 
any  integer  t  (1  <  t  <  T)  and  any  schedule  S,  we  have 


f(S{t_l}®S)-f  (5M) 

ns) 


<  max 

(»,r)eVxl>o 


/(S(.-i)e((»,r)))-/(g(,_1))j 


in  33.2 


Thus  the  function  /  satisfies  Condition  [TJ  so  the  analysis  of  the  greedy  approximation  algorithm 
applies  to  the  schedule  S.  In  particular,  Theorem  [fi]  implies  that  Rcoverage  <  Ylt=i  rt  =  R- 

~  □ 


Similarly,  Theorem  [7]  implies  that  Rcost  <TR. 


To  complete  the  analysis,  it  remains  to  bound  E  [R] .  First,  note  that  the  payoffs  to  each  ex¬ 
perts  algorithm  £t  depend  on  the  choices  made  by  experts  algorithms  £\ ,  £2,  •  •  •  ,£t- i,  but  not 
on  the  choices  made  by  £t  itself.  Thus,  from  the  point  of  view  of  £t,  the  payoffs  are  gener¬ 
ated  by  a  non-adaptive  adversary.  Suppose  that  randomized  weighted  majority  (WMR)  is  used 
as  the  subroutine  experts  algorithm.  Because  each  payoff  is  at  most  1  and  there  are  n  rounds, 

E  [rt]  =  O  ^ \J Gynax  In  \A\  j  =  O  ^-^/nln  |^4|  j ,  so  a  trivial  bound  is  E  [R]  =  O  (t ^/nlii  |^4.Q .  In 

fact,  we  can  show  that  the  worst  case  is  when  Grnax  =  0  for  all  T  experts  algorithms,  leading 
to  the  following  improved  bound.  The  proof  is  given  in  Appendix  A. 


Lemma  4.  Algorithm  OGunjt,  run  with  WMR  as  the  subroutine  experts  algorithm,  has  E  [R]  = 
O  ^  \JT n  In  |  A|  j  in  the  worst  case. 

Combining  Lemmas  [3]  and  |4]  yields  the  following  theorem. 

Theorem  8.  Algorithm  OGul,it,  run  with  WMR  as  the  subroutine  experts  algorithm,  has  E  \RCoverage] 
=  O  (yTn  In  | .4 1 )  and  E  [Rcost\  =  O  (t ^Tn\n\A\}  in  the  worst  case. 


4.3  From  unit-cost  actions  to  arbitrary  actions 

In  this  section  we  generalize  the  online  greedy  algorithm  presented  in  the  previous  section  to  ac¬ 
commodate  actions  with  arbitrary  durations.  Like  OGun;t,  our  generalized  algorithm  OG  makes 
use  of  a  series  of  experts  algorithms  £\ ,  £2, . . . ,  £jj  (for  L  to  be  determined).  On  each  round 
i,  OG  constructs  a  schedule  Si  as  follows:  for  t  =  1,2, ...  ,L,  it  uses  £t  to  choose  an  action 
a\  =  (v,  t)  g  A,  and  appends  this  action  to  S)  with  probability  K  The  payoff  that  £t  associates 
with  action  a  equals  ^  times  the  increase  in  /  that  would  have  resulted  from  appending  a  to  the 
schedule-under-construction. 
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Algorithm  OG 

Input:  integer  L,  experts  algorithms  £u  £2,  ■  ■  ■ ,  SL. 

For  i  from  1  to  n: 

1.  Let  Si} o  =  (}  be  the  empty  schedule. 

2.  For  each  t,  1  <  t  <  L, 

(a)  Use  St  to  choose  an  action  a\  =  (v,  r)  G  A. 

(b)  With  probability  set  S^t  =  else  set  St,t  =  Sijt- 1. 

3.  Select  the  schedule  S'*  =  S)]L. 

4.  Receive  the  job /j. 

5.  For  each  t,  1  <  t  <  L,  and  each  action  a  G  A,  feed  back 

A, a  =  \  ( fi  (Si,t- i  ®a)-fi  (Si,t- 0) 

as  the  payoff  would  have  received  by  choosing  action  a. 


Our  analysis  of  OG  follows  along  the  same  lines  as  the  analysis  of  OGun;t  in  the  previous 
section.  As  in  the  previous  section,  we  will  view  each  experts  algorithm  £t  as  selecting  a  single 
“meta-action”  at.  We  extend  the  domain  of  each  f,  to  include  the  meta-actions  by  defining 


fi(S  ©  CLt) 


fi(S<£)alt )  if  a\  was  appended  to  S'* 
fi(S)  otherwise. 


Thus,  the  online  algorithm  produces  a  single  schedule  S  =  (a1,  a2, . . . ,  aL)  for  all  i. 

For  the  purposes  of  analysis,  we  will  imagine  that  each  meta-action  dt  always  takes  unit  time 
(whereas  in  fact,  at  takes  unit  time  per  job  in  expectation).  We  show  later  that  this  assumption  does 
not  invalidate  any  of  our  arguments. 

Let  f  —  \  Y^i= i  fu  and  let  St  =  (ai,  a2, . . . ,  dt).  As  in  the  previous  section,  the  fact  that 
the  sequence  (/i,  f2,  ■  ■  ■ ,  fn)  satisfies  Condition  [5] implies  that  /  satisfies  Condition  [I]  (even  if  the 
schedule  Si  in  the  statement  of  Condition  [I]  contains  meta- actions).  Thus  S  can  be  viewed  as  a 
version  of  the  greedy  schedule  in  which  the  tth  decision  is  made  with  additive  error  (by  definition) 
equal  to 


et  =  ( max^  j  ^  ©  a)  -  f{St-i)j  j  -  ^f(St- 1  ©  at)  -  f(St- 

(where  we  have  used  the  assumption  that  dt  takes  unit  time). 

As  in  the  previous  section,  let  rt  be  the  regret  experienced  by  St.  In  general,  ^  /  et.  However, 
we  claim  that  E  [et]  =  E  [^] .  To  see  this,  fix  some  integer  t  (1  <  t  <  L),  let  At  =  (a),  af, . . . ,  a™) 
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be  the  sequence  of  actions  selected  by  £t,  and  let  y\  be  the  payoff  received  by  £,  on  round  i  (i.e., 
y\  —  xltai).  By  construction, 


Vt 


E 


fi(St-x  ©  —  fi(St-i)\Atl  St- 1 


Thus, 


n 

n 


max 

(v,t)&A 


\  (/(S.-1  ffl  O)  -  /(St-i)) 


E 


f(St- 1 


Taking  the  expectation  of  both  sides  of  the  equations  for  et  and  r,  then  shows  that  E  [et]  =  E  [^] , 
as  claimed. 

We  now  prove  a  bound  on  E  [ RCoverage\  ■  As  already  mentioned,  /  satisfies  Condition  [T|  so 
the  greedy  schedule’s  approximation  guarantees  apply  to  /.  In  particular,  by  Theorem  [6|  we  have 
Rcoverage  E  /E ?.  1  ^ t •  TllUS  E  [-R  coverage  )  <  E  [/?,],  where  R  =  Ylt=i  rt- 

To  bound  E  [RCOVerage] ,  it  remains  to  justify  the  assumption  that  each  meta-action  at  always 
takes  unit  time.  Regardless  of  what  actions  are  chosen  by  each  experts  algorithm,  the  schedule 
is  defined  for  L  time  steps  in  expectation.  Thus  if  we  set  L  =  T,  the  schedules  Sr  returned  by 
OG  satisfy  the  budget  in  expectation,  as  required  in  the  definition  of  RCOverage •  Thus,  as  far  as 
Rcoverage  is  concerned,  the  meta- actions  may  as  well  take  unit  time  (in  which  case  £  (St)  =  T  with 
probability  1).  Combining  the  bound  on  E  [R]  stated  in  Lemma [4]  with  the  fact  that  E  [Rcoverage]  E 
E  [R]  yields  the  following  theorem. 


Theorem  9.  Algorithm  OG,  run  with  input  L  =  T,  has  E  [. RCOVerage ]  E  E  [R],  IfWMR  is  used  as 
the  subroutine  experts  algorithm,  then  E  [R]  —  O  {^jTn  In  |*4.|j. 

The  argument  bounding  E  [RCOst\  is  similar,  although  somewhat  more  involved,  and  is  given 
in  Appendix  A.  Relative  to  the  case  of  unit-cost  actions  addressed  in  the  previous  section,  the 
additional  complication  here  is  that  l  (,Sj.)  is  now  a  random  variable,  whereas  in  the  definition  of 
Rcost  the  cost  of  a  schedule  is  always  calculated  up  to  time  T.  This  complication  can  be  overcome 
by  making  the  probability  that  £  (Si)  <  T  sufficiently  small,  which  can  be  accomplished  by  setting 
L  »  T  and  applying  concentration  inequalities.  However,  E  7i|  grows  as  a  function  of  L,  so  we 
do  not  want  to  make  L  too  large.  It  turns  out  that  the  (approximately)  best  bound  is  obtained  by 
setting  L  =  T  In  n. 


Theorem  10.  Algorithm  OG,  run  with  input  L  =  Thi  n,  has  E  7i',y).s/]  =  (){T In n  ■  E  | Ii\  + 
T \/n).  In  particular,  E  [/?cost]  =  O  ^(ln  n)  2  T  yjT  n  In  \A\j  if  WMR  is  used  as  the  subroutine 
experts  algorithm. 


4.4  Dealing  with  limited  feedback 

Thus  far  we  have  assumed  that,  after  specifying  a  schedule  Si,  the  online  algorithm  receives  com¬ 
plete  access  to  the  job  /*.  We  now  consider  three  more  limited  feedback  settings  that  may  arise  in 
practice: 
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1.  In  the  priced  feedback  model,  to  receive  access  to  /,;  we  must  pay  a  price  C .  Each  time  we 
do  so,  C  is  added  to  the  regret  RCOVerage,  and  TC  is  added  to  the  regret  Rcost- 

2.  In  the  partially  transparent  feedback  model,  we  only  observe  f  (Spt))  for  each  t  >  0. 

3.  In  the  opaque  feedback  model,  we  only  observe  /,;  (Si). 

The  priced  and  partially  transparent  feedback  models  arise  naturally  in  the  case  where  action 
(v,  r)  represents  running  a  deterministic  algorithm  v  for  r  (additional)  time  units  in  order  to  solve 
some  decision  problem.  Assuming  we  halt  once  some  v  returns  an  answer,  we  obtain  exactly  the 
information  that  is  revealed  in  the  partially  transparent  model.  Alternatively,  running  each  v  until 
it  terminates  would  completely  reveal  the  function  /, ,  but  incurs  a  computational  cost. 

Algorithm  OG  can  be  adapted  to  work  in  each  of  these  three  feedback  settings.  In  all  cases, 
the  high-level  idea  is  to  replace  the  unknown  quantities  used  by  OG  with  (unbiased)  estimates  of 
those  quantities.  This  technique  has  been  used  in  a  number  of  online  algorithms  (e.g.,  see  [[2]  4t71). 
Specifically,  for  each  day  i  and  expert  j,  let  Sf  e  [0,1]  be  an  estimate  of  xf  such  that 

E  [x)\  =  yx)  +  6i 

for  some  constant  5l  (which  is  independent  of  j).  In  order  words,  we  require  that  -  (x*  —  5l )  is 
an  unbiased  estimate  of  xf  Furthermore,  let  x'J  be  independent  of  the  choices  made  by  the  experts 
algorithm. 

Let  £  be  an  experts  algorithm,  and  let  £'  be  the  experts  algorithm  that  results  from  feeding  back 
xl-  to  £  (in  place  of  xf)  as  the  payoff  £  would  have  received  by  selecting  expert  j  on  day  i.  The 
following  lemma  relates  the  performance  of  £'  to  that  of  £. 

Lemma  5.  The  worst-case  expected  regret  that  £'  can  incur  over  a  sequence  ofn  days  is  at  most 
where  R  is  the  worst-case  expected  regret  that  £  can  incur  over  a  sequence  ofn  days. 

Proof.  Let  X  —  ^ X  ,  X  ^  ...  ^  X  )  be  the  sequence  of  estimated  payoffs.  Because  the  estimates  x* 
are  independent  of  the  choices  made  by  £' ,  we  may  imagine  for  the  purposes  of  analysis  that  x  is 
fixed  in  advance.  Fix  some  expert  j.  By  definition  of  R, 


E 


E7 


\x 


2=1 


,  2  =  1 


Taking  the  expectation  of  both  sides  with  respect  to  the  choice  of  x  then  yields 


E 


E  74,  +  r) 


2=1 


2=1 


or  rearranging, 


E 


4 


2=1 


> 


E 

,  i=i 


x, 


R 

7 


Because  j  was  arbitrary,  it  follows  that  £'  has  worst-case  expected  regret  -. 
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4.4.1  The  priced  feedback  model 

In  the  priced  feedback  model,  we  use  a  technique  similar  to  that  of  Q.  With  probabiltiy  7,  we  will 
pay  cost  C  in  order  to  reveal  and  then  feed  the  usual  payoffs  back  to  each  experts  algorithm  £t. 
Otherwise,  with  probability  1  —  7,  we  feed  back  zero  payoffs  to  each  £t  (note  that  without  paying 
cost  C,  we  receive  no  information  whatsoever  about  f, ,  and  thus  we  have  no  basis  for  assigning 
different  payoffs  to  different  actions).  We  refer  to  this  algorithm  as  OGp.  By  Lemma [5]  E  [77] 
is  bounded  by  ^  times  the  worst-case  regret  of  £t.  By  bounding  E  [Rcoverage]  and  E  [ Rcost ]  as  a 
function  of  7  and  then  optimizing  7  to  minimize  the  bounds,  we  obtain  the  following  theorem,  a 
complete  proof  of  which  is  given  in  Appendix  A. 

Theorem  11.  Algorithm  OGp,  run  with  WMR  as  the  subroutine  experts  algorithm,  has  E  [Rcoverage]  — 
O  ^(CTn  \A\)^(Tn)^  (when  runwith  input  L  =  T)  and  has  E  [RCOst]  =  O  [(T  Inn)  I  (Gin  \A\)^(n)% 
(when  run  with  input  L  —  Tin  n)  in  the  priced  feedback  model. 

4.4.2  The  partially  transparent  feedback  model 

In  the  partially  transparent  feedback  model,  each  £t  will  run  a  copy  of  the  Exp3  algorithm  0, 
which  is  a  randomized  experts  algorithm  that  only  requires  as  feedback  the  payoff  of  the  expert 
it  actually  selects.  In  the  partially  transparent  feedback  model,  if  £t  selects  action  a\  =  (v,  r)  on 
round  i,  it  will  receive  feedback  /,  1  ©  of)  —  fr  1)  if  a]  is  appended  to  the  schedule  (with 

probability  1),  and  will  receive  zero  payoff  otherwise.  Observe  that  the  information  necessary  to 
compute  these  payoffs  is  revealed  in  the  partially  transparent  feedback  model.  Furthermore,  the 
expected  payoff  that  £t  receives  if  it  selects  action  a  is  x\  a,  and  the  payoff  that  £t  receives  from 
choosing  action  a  on  round  i  is  independent  from  the  choices  made  by  £t  on  previous  rounds.  Thus, 
by  Lemma[5j  the  worst-case  expected  regret  bounds  of  Exp3  can  be  applied  to  the  true  payoffs  x\a. 

The  worst-case  expected  regret  of  Exp3  is  0(jn\A\ln\A\) ,  so  E  [R]  =  O  (l^/h  |.4|  In  |.4|) . 
This  bound,  combined  with  Theorems  [9|  and  |T0l  establishes  the  following  theorem. 

Theorem  12.  Algorithm  OG,  run  with  Exp3  as  the  subroutine  experts  algorithm,  has  E  [RCOverage\  — 

O  (T^n  |^4|  In  \A\^  (when  run  with  input  L  =  T)  and  has  E  [Rcost]  =  O  (^(Tlnn)2y/n  |^4|  In  p4|  j 
(when  run  with  input  L  =  Tin  n)  in  the  partially  transparent  feedback  model. 

4.4.3  The  opaque  feedback  model 

In  the  opaque  feedback  model,  our  algorithm  and  its  analysis  are  similar  to  those  of  OGp.  With 
probability  1  —  7,  we  feed  back  zero  payoffs  to  each  £t.  Otherwise,  with  probability  7,  we  explore 
as  follows.  Pick  t  uniformly  at  random  from  (1,2,...,  L},  and  pick  an  action  a  =  (v,  r)  uniformly 
at  random  from  A.  Select  the  schedule  Si  =  Sijt- 1  ©  a.  Observe  ffSf),  and  feed  ^  times  this  value 
back  to  £t  as  the  payoff  associated  with  action  a.  Finally,  feed  back  zero  for  all  other  payoffs. 

We  refer  to  this  algorithm  as  OG°.  The  key  to  its  analysis  is  the  following  observation.  Letting 
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x\  a  denote  the  payoff  to  experts  algorithm  £t  for  choosing  action  a  =  (v,  r)  on  round  i,  we  have 


E  = 7 '  i '  \X\ '  \ ' 


L\A\  tA 


+  Si 


where  x\  a  =  \  i  ®  a)  -  f{Sijt- 1))  and  5i  =  Thus,  x\a  is  a  biased  estimate 

of  the  correct  payoff,  and  Lemma  ^implies  that  E  [rt]  is  at  most  times  the  worst-case  expected 
regret  of  £t. 

The  performance  of  OG°  is  summarized  in  the  following  theorem,  which  we  prove  in  Ap¬ 
pendix  A. 

Theorem  13.  Algorithm  OG°,  run  with  WMR  as  the  subroutine  experts  algorithm,  has  E  [RCoverage]  = 
O  ^T(|*4|  in  \A\)^n%  j  (when  run  with  input  L  =  T)  and  has  E  [Rcost]  =  O  ^(T  lnn)2(|.4|  in  \A\)^n^ 
(when  run  with  input  L  =  T  in  n)  in  the  opaque  feedback  model. 


4.5  Lower  bounds  on  regret 


In  Appendix  A  we  prove  the  following  lower  bounds  on  regret.  The  lower  bounds  apply  to  the 
online  versions  of  two  set-covering  problems:  Max  /c-Coverage  and  Min-Sum  Set  Cover. 


The  offline  versions  of  these  two  problems  were  defined  in  11 .4  The  online  versions  are  special 
cases  of  the  online  versions  of  Budgeted  Maximum  Submodular  Coverage  and  Min- 
Sum  Submodular  Cover,  respectively.  For  a  formal  description  of  the  online  set  covering 
problems,  see  the  text  leading  up  to  the  proofs  of  Theorems  [T4|  and  15  in  Appendix  A. 

It  is  worth  pointing  out  that  the  lower  bounds  hold  even  in  a  distributional  online  setting  in 
which  the  jobs  f\  ,  /2 , . . . ,  fn  are  drawn  independently  at  random  from  a  fixed  distribution. 

Theorem  14.  Any  algorithm  for  online  Max  ^-COVERAGE  has  worst-case  expected  1 -regret 
q  I  .  /tv,  v  M 


Tn\w'~Yj,  where  V  is  the  collection  of  sets  and  T  =  k  is  the  number  of  sets  selected 
by  the  online  algorithm  on  each  round. 

Theorem  15.  Any  algorithm  for  online  Min-Sum  Set  Cover  has  worst-case  expected  1-regret 
Q  (yT  \Jtu  hi  ,  where  V  is  a  collection  of  sets  and  T  is  the  number  of  sets  selected  by  the 
online  algorithm  on  each  round. 


In  Appendix  A  we  show  that  there  exist  exponential-time  online  algorithms  for  these  online  set 
covering  problems  whose  regret  matches  the  lower  bounds  in  Theorem  |T4|  (resp.  Theorem  [T?])  up 
to  constant  (resp.  logarithmic)  factors. 

Note  that  the  upper  bounds  in  Theorem  [8]  match  the  lower  bounds  in  Theorems  [T4|  and  p3]  up  to 
logarithmic  factors,  although  the  former  apply  to  (1  —  -regret  and  4-regret  rather  than  1 -regret. 
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4.6  Refining  the  online  greedy  algorithm 

We  now  discuss  two  simple  modifications  to  OG  that  do  not  improve  its  worst-case  guarantees, 
but  that  often  improve  its  performance  in  practice  (we  make  use  of  both  of  these  modifications  in 
our  experimental  evaluation). 

4.6.1  Avoiding  duplicate  actions 

In  many  practical  applications,  it  is  never  worthwhile  to  perform  the  same  action  twice.  As  an  ex¬ 
ample,  suppose  that  an  action  a  =  (v,  r)  represents  performing  a  run  of  length  r  of  a  deterministic 
algorithm  v  (and  then  removing  the  run  from  memory),  and  f(S)  =  1  if  performing  the  actions  in 
S  yields  a  solution  to  a  problem  instance,  and  f(S)  =  0  otherwise.  Clearly,  performing  a  twice  can 
never  increase  the  value  of  /.  In  cases  such  as  this,  the  online  algorithm  OG  as  currently  defined 
may  never  “figure  out”  that  it  should  avoid  performing  the  same  action  twice,  as  the  following 
example  illustrates. 

Example  2.  Let  A  =  {ai,  a2, . . . ,  aT}  be  a  set  of  T  actions  that  each  take  unit  time,  and  for  all 
i,  let  fi(S)  equal  j  times  the  number  of  distinct  actions  that  appear  in  S.  Thus,  the  schedule 
S*  =  (ai,  a2,  ■  •  ■ ,  ay)  has  fi(S*)  =  1  for  all  i,  and  is  optimal  in  terms  of  coverage.  Suppose  we 
run  OG  on  the  sequence  of  jobs  (/i,  /2,  •  •  • ,  fn)-  All  actions  yield  equal  payoff  to  £\ .  If  £\  is  a 
standard  experts  algorithm  such  as  randomized  weighted  majority,  it  will  choose  actions  uniformly 
at  random.  Given  that  £\  chooses  actions  uniformly  at  random,  £2  will  (asymptotically)  choose 
actions  uniformly  at  random  as  well.  Inductively,  all  actions  will  be  chosen  at  random.  If  so,  the 
probability  that  any  particular  experts  algorithm  selects  a  unique  action  is  1  —  (1  —  ^)T  (which 
approaches  1  —  \  as  T  — >  oo).  By  linearity  of  expectation,  the  expected  fraction  of  actions  that  are 
unique  is  exactly  this  quantity. 

To  improve  performance  on  examples  such  as  this  one,  we  may  force  the  online  algorithm  to 
return  a  schedule  with  no  duplicate  actions  as  follows.  Just  before  job  f,L  arrives,  obtain  from  each 
experts  algorithm  £t  a  distribution  over  A  (for  experts  algorithms  such  as  randomized  weighted 
majority,  it  is  straightforward  to  obtain  this  distribution  explicitly).  We  then  sample  from  these 
distributions  as  follows.  We  first  sample  from  £\  to  obtain  an  action  a\ .  To  obtain  action  a[  for 
t  >  1,  we  repeatedly  sample  from  the  distribution  returned  by  £,  until  we  obtain  an  action  not 
in  the  set  ^a\,al2, . . . ,  a^}  (given  the  distribution,  we  can  simulate  this  step  without  actually 
performing  repeated  sampling). 

With  this  modification,  OG  always  achieves  coverage  1  for  the  job  /  described  in  example 
|2j  Furthermore,  this  modification  preserves  the  worst-case  guarantees  of  the  original  version  of 
OG  (under  the  assumption  performing  the  same  action  twice  never  increases  the  value  of  any 
function  /)).  Informally,  this  follows  from  the  fact  that  the  expected  payoff  received  by  sampling 
from  the  modified  distribution  can  never  be  smaller  than  the  expected  payoff  received  by  sampling 
from  the  original  distribution  (because  the  payoffs  associated  with  the  experts  corresponding  to 
actions  already  in  the  schedule  are  all  zero).  For  this  reason,  this  modification  never  increases  the 
worst-case  regret  of  the  experts  algorithms,  and  our  previous  analysis  carries  through  unchanged. 


21 


4.6.2  Independent  versus  dependent  probabilities 

Recall  that  in  the  case  of  arbitrary-cost  actions,  when  an  experts  algorithm  selects  an  action  (v,  r) 
we  add  this  action  to  the  schedule  independently  with  probability  -.  The  fact  that  this  addition 
is  performed  independently  of  the  actions  that  are  already  in  the  schedule  can  lead  to  undesirable 
behavior,  as  the  following  example  illustrates. 

Example  3.  Let  V  =  {u}  consist  of  a  single  activity,  let  f(S)  =  1  if  S'  contains  the  action  (v,  T ), 
and  let  f(S)  =  0  otherwise.  Thus,  the  schedule  S*  =  ((v.  T))  maximizes  /(S'(t)).  However, 
E  [/(S')]  <  1  —  (1  —  S)T  if  S'  is  a  schedule  returned  by  OG.  This  is  true  because  at  most  T  experts 
algorithms  can  select  the  action  (v,T),  but  in  each  case  the  action  is  only  added  to  the  schedule 
with  probability  ^ ,  so  the  probability  that  (v.  T)  is  added  to  the  schedule  is  at  most  1  —  (1  —  j,)T. 
which  approaches  1  —  I  as  T  — >  oc. 

We  can  fix  this  problem  as  follows.  When  experts  algorithm  £t  selects  an  action  at  =  (v,  r), 
we  increase  the  probability  that  the  action  is  in  the  schedule  by  K  In  other  words,  if  a,  has  been 
picked  by  k  experts  algorithms  so  far  but  has  still  not  been  added  to  the  schedule,  then  we  add  it  to 
the  schedule  with  probability  Thus,  if  r  consecutive  experts  algorithms  select  the  same  action 
(v,r),  it  will  always  be  added  to  the  schedule  exactly  once. 

The  schedules  produced  by  this  modified  online  algorithm  still  consume  T  time  steps  in  expec¬ 
tation,  and  our  previous  analysis  carries  through  to  give  same  regret  bounds  on  Rcoverage  that  were 


depends  critically  on  the  use  of  independent  probabilities,  and  does  not  carry  through  after  having 
made  this  modification.  Nevertheless,  in  our  experiments  in  (j6]we  found  that  this  modification  was 
helpful  in  practice. 


stated  in  Theorem  Nl  Unfortunately,  the  analysis  for  the  bounds  on  Rcost  stated  in  Theorem  10 


5  Open  Problems 

The  results  presented  in  this  paper  suggest  several  open  problems: 

1.  Avoiding  discretization.  As  currently  defined,  our  online  algorithm  can  only  handle  finite 
set  of  actions  A.  Thus,  to  apply  this  online  algorithm  to  a  problem  in  which  the  actions 
have  real- valued  durations  between  0  and  1,  one  might  discretize  the  durations  to  be  in 
the  set  |t,  . . . ,  l}.  To  achieve  the  best  performance,  one  would  like  to  set  T  as  large 
as  possible,  but  the  time  and  space  required  by  the  online  algorithm  grow  linearly  with 
T.  It  would  be  desirable  to  avoid  discretization  altogether,  perhaps  after  making  additional 
smoothness  assumptions  about  the  jobs  /j.  A  possible  approach  would  be  to  consider  the 
limiting  behavior  of  our  algorithm  as  T  — >  oo,  for  some  particular  choice  of  subroutine 
experts  algorithm. 

2.  Lower  bounds  on  4-regret  and  1  —  \  regret.  The  lower  bounds  proved  in  f|4.5|  apply  only 
to  1 -regret,  whereas  our  online  algorithms  optimize  either  4  regret  (in  the  case  of  Rcost )  or 
1  —  \  regret  (in  the  case  of  Rcoverage)-  It  would  be  interesting  to  prove  lower  bounds  on  Rcost 
and  Rcoverage-  Such  lower  bounds  would  hold  for  online  algorithms  that  make  decisions  in 
polynomial  time,  under  the  assumption  that  P  NP. 
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3.  An  online  version  of  the  refined  greedy  approximation  algorithm  G' .  Recall  that  in  §|3.2| 
we  showed  that  the  offline  greedy  approximation  algorithm  is  sub- optimal  for  a  simple  job 
involving  two  activities,  and  then  considered  an  alternative  greedy  approximation  algorithm 
that  produces  an  optimal  schedule  for  this  job.  The  online  algorithm  presented  in  (j4]is  based 
on  the  original  greedy  approximation  algorithm,  and  thus  it  also  performs  sub-optimally  on 
this  simple  example.  Although  it  appears  non-trivial  to  do  so,  it  would  be  interesting  to 
develop  an  online  version  of  the  alternative  greedy  approximation  algorithm  that  performed 
optimally  on  such  examples. 

6  Experimental  Evaluation  on  SAT  2007  Competition  Data 

The  annual  SAT  solver  competition  (www .  satcompetit  ion  .  org)  is  designed  to  encourage 
the  development  of  efficient  Boolean  satisfiability  solvers,  which  are  used  as  subroutines  in  state- 
of-the-art  model  checkers,  theorem  provers,  and  planners.  The  competition  consists  of  running 
each  submitted  solver  on  a  number  of  benchmark  instances,  with  a  per- instance  time  limit.  Solvers 
are  ranked  according  to  the  number  of  instances  they  solve  within  each  of  three  instance  categories: 
industrial,  random,  and  hand-crafted. 

In  this  section  we  evaluate  the  online  algorithm  OG  by  using  it  to  combine  solvers  from  the 
2007  SAT  competition.  To  do  so,  we  used  data  available  on  the  competition  web  site3  to  con¬ 
struct  a  matrix  t,  where  thj  is  the  time  that  the  jth  solver  required  on  the  ith  benchmark  instance. 
We  used  this  data  to  determine  whether  or  not  a  given  schedule  would  solve  an  instance  within 
the  time  limit  T  (schedule  S  solves  instance  i  if  and  only  if,  for  some  j,  S/t)  contains  actions 
(hj,  r | ),  (hj,  r2), ,  ( hj ,  tl)  with  ri  —  U,j)-  Within  each  instance  category,  we  compared 
OG  to  the  offline  greedy  schedule,  to  the  individual  solver  that  solved  the  most  instances  within 
the  time  limit,  and  to  a  schedule  that  ran  each  solver  in  parallel  at  equal  strength.  We  ran  OG  in 
the  full-information  feedback  model. 

Table  1  summarizes  the  results.  In  each  category,  the  offline  greedy  schedule  and  the  online 
greedy  algorithm  solved  more  instances  than  any  solver  that  was  entered  in  the  competition,  and 
solve  more  instances  than  the  naive  parallel  schedule. 


Table  1:  Number  of  benchmark  instances  solved  within  time  limit. 


Category  (instances) 

Offline 

Online 

Parallel 

Top  solver 

Industrial  (234) 

147 

149 

132 

139 

Random  (511) 

350 

347 

302 

257 

Hand-crafted  (201) 

114 

107 

95 

98 

3  We  use  the  data  from  phase  2  of  the  competition,  available  at  http://www.cril.univ-artois.fr/ 
SAT07/ 
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7  Conclusions 


This  paper  considered  an  online  resource  allocation  problem  that  generalizes  several  previously- 
studied  online  problems,  and  that  has  applications  to  algorithm  portfolio  design  and  the  optimiza¬ 
tion  of  query  processing  in  databases.  The  main  contribution  of  this  paper  was  an  online  version  of 
a  greedy  approximation  algorithm  whose  worst-case  performance  guarantees  in  the  offline  setting 
are  the  best  possible  assuming  P  f  NP. 
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Appendix  A:  Additional  Proofs 


Theorem [5|  c  (/,  G')  <  4  /“Q  1  -  maxSeS  {/  ( S{t) )  }  dt  <  4 minSe5  {c  (/,  S')}. 
Proof.  Recall  that  G'  =  (g[,  g'2, . . .),  where  G'-  =  (g[,  g'2, . . . ,  and 


argmax  „T  ,  x . 

(t),r)£VxI>0  ft,=0  1  -  /  (G'j  +  (v,t'))  dt' 


/(G'©(n,r))-/(G') 


(7.1) 


Let  s'-  equal  the  jth  value  of  the  arg  max  in  (|7. 1  [),  multiplied  by  the  quantity  1  —  .[(G'j)-  We  will 
make  use  of  the  following  claim. 

Claim  1.  For  any  schedule  S,  any  positive  integer  j,  and  any  t  >  0,  /  (Sw)  <  /  (Gj)  +  tsj. 

Proof  Fix  an  action  a  =  ( v ,  r).  By  monotonicity  of  /,  we  have  J#=(j  1  —  /  (G'-  ©  ((v,  r)))  dt'  < 
t(1  —  /  (G'j)),  or  equivalently, 


©  wjSl 

t  ILo 1  -  /  (G'i  +  («.  t»  ’ 


This  and  the  definition  of  s'  imply 


/  (G'  ffi  («))-/  (G')  /  (G-  ffi  (a))  -  f  (Gj) 

r  M  7,L„  1  -  /  (GJ- ®  <(M')»*' 

The  claim  then  follows  by  exactly  the  same  argument  that  was  used  to  prove  Fact[j]  □ 

The  remainder  of  the  proof  parallels  the  proof  of  Theorem  |4j  Using  Claim  [T]  and  the  argument 
in  the  proof  of  Theorem  [4]  we  get  that 

/»oo 

/  1  -  “I1#  ^  (5<*>)  I  dt-J2  ~ 

at— 0  ,>1 
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where  Xj  =  fa,  yj  =  fa,  and  Rj  —  1  —  /  (fa).  Letting  g'-  =  fa,  fa,  we  have 

a^-fa  -  yJ+i)  =  \j2  [  1  -  /  (fa  ©  (fa,  0))  dt'  =  \c  (/,  G')  • 

j>i  i>i  'yt'=0 

which  proves  the  theorem. 


□ 


We  now  prove  the  theorems  concerning  the  performance  of  the  greedy  schedule  G,  in  which 
the  jth  evaluation  of  the  arg  max  in  (|3. 1  [)  is  performed  with  additive  error  er  To  ease  notation,  let 
G  —  (gi,g2,  ■  ■  ■),  where  g3  =  fa,  t:j ) .  Let  sj  =  .  To  prove  Theorems j6j and |tJ  we  will 

make  use  of  the  following  fact,  which  can  be  proved  in  exactly  the  same  way  as  Fact|T] 


Fact  2.  For  any  schedule  S,  any  positive  integer  j,  and  any  t  >  0,  we  have  f(S(t))  <  / ( fa )  +  t  ■ 

fa  +  fa- 


Theorem 


Let  L  be  a  positive  integer,  and  let  T  =  Vfa  tv  where  g3  =  fa,  Tj).  Then 


f  (faT>)  >  ( 1 


(%>)}-£ 

3=1 


ejTj 


Proof.  Let  C*  =  rnaxse5  {/  (Sfa)  },  and  for  any  positive  integer  j,  let  A  j  =  C*  —  f  ( Gj ).  By 
Fact [2]  C*  <  f  (Gj)  +  T(Sj  +  ef).  Thus 

Aj  <  T(sj  +  6j)  =  T  ^  -  —  J+1  +  . 

Rearranging  this  inequality  gives  A^+i  <  A  j  (l  —  fa  +  TjCj.  Unrolling  this  inequality  (and  using 
the  fact  that  1  —  ^  <  1  for  all  j),  we  get 


Al+i  <  Ai  J  1 


\j= 1 


L 


+  Tiei  • 
3=1 


Let  E  =  fafa  Tj6j.  Subject  to  the  constraint  fa  fa  Tj  =  T,  the  product  series  is  maximized  when 
Tj  =  j-  for  all  j.  Thus  we  have 


C*  —  f  (Gl+1)  =  Al+1  <  Aj 


1\  1  1 

1--  +E  <  Ax-  +  E  <  C*-  +  E  . 
L  e  e 


Thus  /  (Gl+i)  >  (1  —  \)C*  —  E,  as  claimed. 


□ 
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Theorem 


Let  L  be  a  positive  integer,  and  let  T  =  Tr  where  9j  —  (v:j-  Tj)-  For  an}' 


schedule  S,  define  cT  (/,  S)  =  f^0  1  —  f  ( S (t))  dt.  Then 

poo  L 

cT  (f,G)  <4  /  1  -  max  {/  (S{t))  }dt  +  J2  EiTj  ■ 


n= o 


3= 1 


where  Ej  =  J2i<j  eiri- 

Proof.  Let  Rj  —  1  —  /  ( G} ) ,  let  /?,'  =  A,  —  Ej.  Assume  for  the  moment  that  Rl  >  El,  so  that  /?' 
is  non-negative  for  J  <  L.  Let  s'j  =  Sj  +  €j.  By  construction, 

L  -  «;+.  =  /  (Ah)  -  /  (A)  +  ijTj  =  r,S'  .  (7.2) 

Let  Xj  =  2^-;  let  yj  =  -f:  and  let  h(x)  =  1  —  maxs  {/  (S(xp  }.  By  Fact  2 

max  {/(%.>)}  <  /  (Gj)  +  Xj-s'-  =  /  (G,)  +  y  . 


Thus  h(xj)  >  Rj  —  =  R'  ^  E]  >  Uj-  The  monotonicity  of  /  implies  that  //(a;)  is  non-increasing 

and  (together  with  the  fact  that  Ej  is  non-decreasing  as  a  function  of  j)  implies  that  the  sequence 
(yi,  2/2,  •  •  •)  is  non-increasing.  As  illustrated  in  Figure  [Tj  these  facts  imply  that  / 'x'0  h(x)  dx  > 

J2j= i  xi  (vj  ~  Vj+i)-  Thus  we  have 


max 

s&s 


{  f  {S{t))}  dt  —  h(x)  dx 


'  x=0 


3= 1 

"4^Si - S' - 


EE 


3= 1 


1 

4 


E3T3  e 

o=i  a>! 

L 


JTj 


>Lt(/.g)-^Eb^ 


4  ^  ~a-a 
i=i 


(Figure  [T]) 


(equation  (|7.2[)) 


(mono tonicity  of  /) 


which  proves  the  theorem,  subject  to  the  assumption  that  7?^  >  El. 
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Now  suppose  Rl  <  EL.  Let  K  be  the  largest  integer  such  that  RK  >  EK,  and  let  TK  = 
Y_!]  ]  Tj.  By  the  argument  just  given, 


c 


Tk 


(f,G)<  4/  1- 

Jt.=o 


K 

max{f  (s{t))}  dt+J2Eiri  ■ 

3= 1 


Thus  to  prove  the  theorem,  it  suffices  to  show  that  cT  (/,  G)  <  cTk  (/,  G)  +  Y^=k+  i  Xrr  This 
holds  because 


cT  (/,  G)  -  cT-  (/,  G)  =  [T  1  -  /  (G(t>)  dt 

J  t=TK 

<  (T  -TK){1-  f(G{TK))) 
=  (T  —  Tk)Rk+i 

<  (T  —  Tk)Ek+ i 

L 

—  XL  . 

j=A'+l 


□ 


Lemma  [2j  Any  sequence  (/j,  f2,  ■  ■  ■ ,  fn)  of  jobs  satisfies  Condition  [2]  Thai  is,  for  any  sequence 
Si,  S2,  ■ . .  ,Sn  of  schedules  and  any  schedule  S, 

EIU ^  _  fE7UMsl®((v,r)))-Msl) 

-  -  \  lllelX  <  - 

r(5)  (d,t)£Vxi>0  [  r 

Proof  Let  r  denote  the  right  hand  side  of  the  inequality.  Let  S  =  (ai,  a2, . . . ,  aL),  where  «/  = 
Let 

A  itl  =  ffSi  ®  (ai,  a2, . . . ,  a;))  -  /(^  ®  (ai,  a2, . . . ,  aj_i))  . 
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We  have 


E/-(S<©S)  =  E 

2=1  2=1 

n 


/i(s,)+EA« 

v  (  =  1 

MS,)  +  E  (/i(s<  ®  <o,»  -  /(Si)) 


i=l 


n  L  n 

=  E  /<($) + E  E  ws  ®  <<*>» -  /<s<)) 

2=1  Z=1  2=1 


n  L 

<  ^2MSi)  +  'ri 

i= 1  Z=1 


=  E/*(S<)+’"(,(S)  ■ 
2=1 


(telescoping  series) 


(submodularity) 


(definition  of  r) 


y-n  r  (S'®  S') _ f’iS’') 

Rearranging  this  inequality  gives  <  r,  as  claimed.  □ 

Lemma |4}  Algorithm  OGunit  with  randomized  weighted  majority  as  the  subroutine  experts  algo¬ 
rithm  has  E  rt\  =  O  (  \/Tn  In  |*4.|  j  in  the  worst  case. 


Proof.  Let  k  =  \A\.  Let  xt  be  the  total  payoff  received  by  £t,  and  let  gt  —  xt  +  rt  be  the  total 
payoff  that  could  have  been  received  by  £t  in  hindsight  (had  it  been  forced  to  choose  a  fixed  expert 
each  day).  Because  Y^t=i  xt  <  we  have  J2t= i  9t  <  n  +  R-  Using  WMR,  E  [rt]  =  O  (y/gt  In  k) . 
Using  WMR,  the  actual  value  of  rt  will  be  tightly  concentrated  about  its  expectation,  as  can  be 
shown  using  Azuma’s  inequality.  In  particular,  because  gt  <  n,  the  probability  that  R  >  n 
is  exponentially  small.  Assuming  R  <  n,  we  have  ,  gt  <  2 n.  Subject  to  this  constraint, 

Y^t= i  yfft  is  maximized  when  gt  —  for  all  t.  Thus  in  the  worst  case,  E  [R]  =  O  i^\/Tn  In  k^j . 

□ 


In  order  to  prove  Theorem  [TOj  we  first  prove  the  following  lemma.  The  lemma  relates  the 
expected  cost  of  the  schedule  Si  (selected  by  OG  on  round  i )  to  the  expected  cost  St  would  incur 
if,  hypothetically,  each  of  the  “meta- actions”  selected  by  each  experts  algorithm  £t  consumed  unit 
time  on  every  job  (require  that  this  assumption  was  made  in  the  analysis  in  the  main  text). 

Lemma  6.  Fix  a  sequence  of  jobs  (/i,  /2, . . . ,  fn)  and  an  integer  i  (1  <  i  <  n).  Let  Si  be  the 
schedule  produced  by  OG  to  use  on  job  /,,  and  let  StJ_ ,  denote  the  partial  schedule  that  exists 
after  the  first  t  —  1  experts  algorithms  has  selected  actions.  Then 


E[c<(a,(/i,Si)]  <E 


E  (!  -  USi,t-i)) 


t=  1 
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Proof.  Fix  some  t.  Let  a\  =  (v,  r)  be  the  action  selected  by  St  on  round  i,  and  define 

i  =  f  ft>=0  1  -  fi{Si,t- 1  ©  ((v,  t')))  dt'  if  a\  is  appended  to  St 
t  I  0  otherwise. 


By  construction,  (ft,  Si)  =  Y2t=i  ci-  Because  a\  is  appended  to  Si  with  probability  f  and 
because  f  is  monotone,  we  have 

E  [cjlS^-J  =  -  [T  1  -  fi{Si,t- !  ©  (M')»  df  <  1  -  1)  . 

T  Jt'= o 


Taking  the  expectation  of  both  sides  yields  E  [c\]  <  E  [1  —  ffS^t-i)].  Then  by  linearity  of  expec¬ 
tation, 


L 


L 


E[c^>  (USi)] 


E 


<  E 


£(i 


□ 


Theorem  10  Algorithm  OG,  run  with  input  L  =  Thin,  has  E  [RCOst]  —  0(T  in n  ■  E  7?|  + 
Ty/n).  In  particular,  E  [RCOst]  =  O  ^(ln  n)iT\jTn  In  A|  j  if  WMR  is  used  as  the  subroutine 
experts  algorithm. 


Proof.  The  arguments  in  the  main  text  showed  that  OG  can  be  viewed  as  a  version  of  the  greedy 
schedule  for  the  function  /  =  ^  S"=i  fu  in  which  the  Ith  decision  is  made  with  additive  error  et, 
under  the  assumption  that  all  “meta-actions”  a\  require  unit  time  on  every  job.  Thus  by  Theorem 
[7j  we  have 

n  L  (  n  \  L 

~  fi(Si,t- 1))  ^ 4  ■  ^  \  c  \  +  nL  6t  •  (73) 

1=1  1=1  I, 1=1  J  1=1 

Also  recall  from  the  main  text  that  E  [et]  =  E  [— ] ,  where  rt  is  the  regret  experienced  by  £t,  and 
that  we  define  R  =  Ylt=i  rt-  Thus,  we  have 


E 


£<'ls,)  (/..Si) 


1=1 


<  E 


n  L 


i=l  1=1 

-  4  '  fls  \J2c(fi,S)\+L-E[R]  . 


(Lemma  [6]) 
(equation  |7. 3 1) 


1=1 


If  it  was  always  the  case  that  £  (Si)  >  T,  then  we  would  have  cT  (ft,  Sf  <  f{Si)  (fa,  Sf),  and 
this  inequality  would  imply  E  [Rcost]  <  L  ■  E  [R],  In  order  to  bound  E  [Rcost\,  we  now  address  the 
possibility  that  (  (Sf)  <  T.  Letting  pi  =  P  [('(S'*)  <  T\,  we  have 

E  [cT  (Si,  fi)]  =  (1  -  Pi)  ■  E  [cT  (Si,  fi)  \£(Si)  >T]+Pi-  E  [cT  (Si,  f)  \l(Si)  <  T] 
<E[c^  (fi,Si)\  +  Pi-T. 
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Putting  these  inequalities  together  yields 


E  [Rcost]  <  L  ■  E  [R\  +  T  pi 


(7.4) 


i= 1 


We  now  bound  pt.  As  already  mentioned,  E  l  (,5V)]  =  L  regardless  of  which  actions  are  selected  by 
the  various  experts  algorithms.  If  L  T,  then  i  (,SV)  will  be  sharply  concentrated  about  its  mean, 
as  we  can  prove  using  standard  concentration  inequalities  (e.g.,  Theorem  5  of  |3]|).  In  particular, 
for  any  A  >  0,  we  have 

P  [£($)<  I- A]  <=exp(EEA 

Setting  A  =  L  —  T  and  simplifying  yields  <  exp  (  — ^  +  l) .  Setting  L  =  Thin  then  yields 
Pi  <  ^=,  s°  the  right  hand  side  of  (|7.4[)  is  O  ( T y/ri).  Thus  E  [Rcost]  —  O  (Tin n  ■  E  [7?]  +  T y/n), 
as  claimed.  Substituting  the  bound  on  E  [R]  stated  in  Lemma[4]then  proves  the  claim  about  WMR. 

□ 


L coverage J 


Theorem  [II|  Algorithm  OGp,  run  with  WMR  as  the  subroutine  experts  algorithm,  has  E  [R, 

O  ^(Cln  \A\)^  (Tn)^  (when  runwith  input  L  =  T)andhas  E  [Rcost]  =  O  ^(Tlnn)f  (CTn  |El|)^(n)f 
(when  run  with  input  L  =  Thin)  in  the  priced  feedback  model. 


Proof.  Let  M  be  the  number  of  exploration  rounds  (so  E  [M]  =  7 n).  The  maximum  payoff  to 
any  single  expert  cannot  exceed  M.  Thus,  by  Lemma  [5  and  the  regret  bound  of  WMR,  we  have 
E[rf|M]  =  O  (^y/M In  \  A\  j.  Using  the  fact  that  E 
random  variable  X,  this  implies 


y/X  <  y/E  [X]  for  any  non-negative 


By  Theorem 


l  /H4[ 
C  V  n 


E  [rt]  =  E  [E  [rt\M]]  =  O  (^y/E  \M\  ^ Rj  =  O  hi  \A^j  . 

we  have  E  [RCOVerage\  <  E  [R\  +  C771  =  O  ^Ly^  In  \A\ j  +  C'yn.  Setting  7  = 

then  yields  E  [Rcoverage]  =  O  ^(Uln  \A\)^(Ln)^,  as  claimed. 

we  have  E  [Rcost]  <  L-E  [R]+T y/n+TC^n  =  L  O  ^ L y^  In  \  A\  +  Cjnj , 


Similarly,  by  Theorem 


10 


so  the  same  setting  of  7  yields  E  [Rcost]  =  O  ( Ls  (C In  |El|)  an 


1  2 
\  3 


□ 


coverage 

1  2 


Theorem  [13|  Algorithm  OG°,  run  with  WMR  as  the  subroutine  experts  algorithm,  has  E  [7?, 

O  ^T(|El|  In  \A\)^n^  (when  runwith  input  L  =  T)  andhasE  [Rcost]  —  O  ^(T  In  n)2(\A\  In  \A\)^n^ 
(when  run  with  input  L  —  Tin  n)  in  the  opaque  feedback  model. 

Proof.  We  showed  in  the  main  text  that  E  [xltja\  =  -jfjyf.n  +  where  x\a  is  the  estimated  payoff 
fed  back  by  OG°  and  x\  a  is  the  true  payoff.  Thus  by  Lemma  5  E  [rt]  is  bounded  by  times 


the  worst-case  regret  of  £t.  Using  the  same  argument  we  used  in  the  proof  of  Theorem  1 1  we  get 
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E  [R]  =  O  (  L^f  ^  In  |4| ),  where  7'  =  By  Theorem 


we  have  E  [R, 


coverage J 


771  =  O  (  L.  M  In  |„4|  +  Cy'n,  where  C  —  L\A\.  As  in  the  proof  of  Theorem 


la  /  HA 


<  E  [R]  + 
setting  7'  = 


then  yields  E  [RCOverage]  =  O  ^(CTn \A\)z(Ln)3^J  =  O  (t(\A\  In  |„4|)  snaj ,  and 
the  same  setting  of  7'  yields  E  [ Rcost }  =  O  (l^(C  In  |  ^4.  | )  I  i  =  O  ({T  In  n)2(\A\  In  |4.|)^niV 


□ 


We  now  prove  lower  bounds  on  regret.  As  mentioned  in  the  main  text,  our  lower  bounds  will 
hold  for  the  online  versions  of  Max  ^-Coverage  and  Min-Sum  Set  Cover. 

We  consider  the  following  online  version  of  Max  fc-C  OVERAGE.  One  is  given  a  collection  C 
of  sets,  where  each  set  in  C  is  a  subset  of  a  universe  E  =  {e\,  e2, . . . ,  en}.  One  cannot  examine 
the  sets  (or  even  determine  their  cardinalities)  directly.  On  round  %  of  the  game,  one  must  specify  a 
subcollection  CcC,  with  C  =  k.  One  then  receives  a  reward  of  1  if  element  e,  belongs  to  some 
set  in  the  collection,  and  receives  a  reward  of  zero  otherwise.  One  then  leams  as  feedback  which 
sets  e,  belonged  to. 

This  problem  is  a  special  case  of  the  online  version  of  Budgeted  Maximum  Submodular 
Coverage.  To  see  this,  let  V  —  C  be  the  set  of  activities,  and  think  of  the  action  (v,  r)  as 
including  the  set  v  in  the  collection  assuming  r  >  1,  and  having  no  effect  otherwise.  For  any 
schedule  S,  let  f{(S )  =  1  if  one  of  the  sets  added  to  the  collection  by  S  contains  e*,  and  let 
fi(S)  =  0  otherwise.  Then  Budgeted  Maximum  Submodular  Coverage  on  the  sequence 
of  jobs  {fi,  f 2,  ,  fn ),  with  time  limit  T  =  k,  is  exactly  the  problem  just  described. 

The  online  version  of  Min-Sum  Set  Cover  is  similar,  except  that  instead  of  specifying  a 
subcollection  of  cardinality  k,  one  specifies  a  sequence  of  k  sets  from  C.  One  then  incurs  a  loss 
equal  to  the  number  of  sets  one  must  look  through  in  the  sequence  in  order  to  find  e*,  or  a  loss  of  k 
if  et  does  not  appear  in  the  sequence  at  all.  By  the  arguments  just  given,  this  is  equivalent  to  online 
Min-Sum  Submodular  Cover  on  the  sequence  of  jobs  (/1;  f2, . . . ,  /„),  where  T  =  k  is  the 
time  at  which  schedule  costs  are  truncated. 

To  prove  lower  bounds  on  regret,  we  will  require  the  following  technical  lemma.  The  proof  is 
a  straightforward  generalization  of  the  proof  of  Lemma  3.2.1  of  [0,  which  considered  the  special 
case  p—\. 


Lemma  7  ([f6j|  ).  Let  X\ .  X2. . . .  ,XS  be  s  independent  random  variables,  where  Xt  equals  the 
number  of  heads  in  n  flips  of  a  coin  with  bias  p.  Let  p  =  np  and  let  a  =  \Jnp(  1  —  p).  Then 

E  [max  {Xi,X2, . . . ,  Xs}]  —  p  +  Ll  ^oVln  sj  . 

Theorem  [14}  Any  algorithm  for  online  Max  ^-COVERAGE  has  worst-case  expected  1 -regret 
Q  ^  \Jtti  In  »  where  V  is  the  collection  of  sets  and  T  =  k  is  the  number  of  sets  selected 
by  the  online  algorithm  on  each  round. 
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Proof.  Let  V  be  a  collection  of  sets.  On  each  round  of  the  online  game,  whether  or  not  a  given 
set  covers  the  element  will  be  determined  by  flipping  a  coin  of  bias  p  —  Thus,  regardless 
of  which  T  sets  are  selected  by  the  online  algorithm,  the  probability  that  it  covers  the  element  is 
q  —  1  —  (l  —  G  [|,  ^g],  and  the  expected  number  of  elements  the  online  algorithm  covers  is 

nq. 

We  now  consider  the  number  of  elements  that  could  have  been  covered  in  hindsight.  Let 
R  =  \J f  In  ^  ■  Partition  V  into  T  bins,  each  of  size  s  =  -yr.  Let  S*  denote  the  set  in  the  ith 
bin  which  covers  the  largest  number  of  elements,  and  let  C*  =  {S'J',  S%, . . . ,  S^}.  To  prove  the 
theorem,  it  suffices  to  show  that  C*  covers  nq  +  Q  (T R)  elements  in  expectation. 

Let  a  collection  C  =  {Si,  S2,  ■  ■  ■ ,  St}  consist  of  a  random  set  drawn  from  each  bin.  In  ex¬ 
pectation  C  covers  nq  elements.  Let  x,  :=  S*  —  |S)|  and  note  that  xt  >  0  and  E  [x,,]  =  Q  (11) 
by  Lemma  [7]  Randomly  mark  xt  elements  of  S*  and  let  M,  and  Ui  denote  the  marked  and  un¬ 
marked  elements  of  S*,  respectively.  Note  that  the  collection  {[/,  :!<?'<  T}  covers  nq  ele¬ 
ments  in  expectation.  Let  X  denote  the  (random)  number  of  additional  elements  covered  by  the 
collection  {Mt  :  1  <  i  <  T}  (i.e.,  X  =  |  U*  —  Uj£/j|).  We  claim  that  E  [X]  =  ( TR ).  To 
prove  this,  define  £  to  be  the  event  “for  all  S  G  C,  \S\  <  nfT"  and  let  Y  be  the  number  of 
marked  elements  covered  exactly  once  in  C*.  We  will  show  that  E  [Y  \  £]  P  [^]  =  Q  (TR).  Since 
E  [Y  |  £]  •  P  [S]  <  E  [y]  <  E  [X],  this  is  sufficient  to  complete  the  proof. 

Fix  i  and  any  element  e  G  M,.  Then  P  [e  uniquely  covered  |  £]  =  (l  —  |5*|/n)  > 

(1  —  1/T)T~ 1  >  1/e.  This  implies  E  [Y  |  ^]  >  /E  |M/]  =  ffl  ( TR ),  since,  as  mentioned, 
E[|M/]  =  Q  (R)  for  alii.  Finally,  the  Chemoff  bound  easily  yields  P  [£]  >  (1  —  |V|  -exp{— n/8T}) 
1  —  o(l),  and  so  E  [Y  |  £]  •  P  [^]  =  Sl(TR)  as  claimed.  □ 


The  lower  bound  in  Theorem [T4| is  optimal  up  to  constant  factors.  To  see  this,  observe  that  run¬ 
ning  randomized  weighted  majority  with  one  expert  for  each  of  the  (^)  possible  collections  of  T 

sets  yields  worst-case  regret  O  |  n  In  (^)  j  =  O  f  \J nT  In  ^  J  for  online  Max  ^-COVERAGE, 


using  the  fact  that  C?)  <  {'W  .  Similarly,  using  a  separate  expert  for  each  of  the  O  ( |V 


|V|e 


possible  permutations  of  T  sets  yields  regret  O  {T \jTn  In  |V|  j  for  online  Min-Sum  Set  Cover, 
which  shows  that  the  lower  bound  in  Theorem [15] is  optimal  up  to  logarithmic  factors. 


Theorem  [15}  Any  algorithm  for  online  Min-Sum  Set  COVER  has  worst-case  expected  1-regret 
(l  ^T  \J Tv  In  ,  where  V  is  a  collection  of  sets  and  T  is  the  number  of  sets  selected  by  the 
online  algorithm  on  each  round. 


Proof.  We  use  the  same  construction  as  in  the  proof  of  Theorem  [14}  Define  the  coverage  time 
of  a  schedule  Si  =  (,Sj,  Sf  ■  ■  ■ ,  Sf)  to  be  the  smallest  t  such  that  S)  covers  the  ith  element,  or 
T  if  no  such  t  exists.  As  in  the  proof  of  Theorem  |T4}  the  probability  that  the  online  algorithm 
covers  any  particular  element  is  q.  Given  that  the  online  algorithm  covers  an  element,  the  expected 
coverage  time  is  zT  for  some  z  <  l  Thus,  any  online  algorithm  has  expected  coverage  time 
t  =  qzT  +  (1  —  q)T  for  each  element. 
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Now  consider  the  schedule  S*  =  {SI,  S^, . . . ,  S ?),  where  S*  —  U,  U  Mt  was  defined  in 
the  proof  of  Theorem  [14J  and  let  the  sets  be  indexed  in  random  order.  The  schedule  U  = 
(U\,  U 2 1  •  •  • ,  Ur)  is  statistically  equivalent  to  a  random  schedule,  and  thus  has  expected  cover¬ 
age  time  t  per  element.  Using  S*  in  place  of  U  causes  X  additional  elements  to  be  covered,  where 

E  [X]  =  Q  (^Tn  In  .  Because  the  sets  in  S*  are  ordered  randomly,  the  expected  coverage 

time  for  each  of  the  X  additional  elements  is  at  most  Thus,  the  total  expected  coverage  time  of 

S*  is  smaller  than  that  of  U  by  at  least  -|E  [X]  =  Q  (  T  yj‘. Tn  In  ^  J .  □ 
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